Select Language:
Chinese AI firm DeepSeek has introduced DeepSeek-OCR, an open-source model that employs optical compression to extract and condense text from images and PDFs. This technology supplies extensive, high-quality training datasets for large language and vision-language models, while also demanding significantly less computing power.
Released on Github yesterday, DeepSeek-OCR harnesses an optical compression technique rooted in vision-language models to tackle the computational hurdles faced by large language models when handling lengthy textual content, as detailed in the paper “Contexts Optical Compression,” published simultaneously.
The approach drastically cuts down the number of text tokens by transforming textual information into visual representations stored in an optical format. According to the report, one GPU with 40GB of memory can produce over 200,000 pages of training data a day.
The model reaches over 96% accuracy with a tenfold reduction in data size, around 90% accuracy at a compression ratio between 10 and 12 times, and about 60% accuracy with a twentyfold data reduction. These results demonstrate that compact language models can learn to interpret compressed visual data effectively, enabling larger models to develop similar skills.
DeepSeek-OCR can compress lengthy content—such as converting dialogue histories into images—enhancing large language models’ capacity to process sizable documents like research papers, legal contracts, and financial reports.
By transforming text into images and applying compression, the system mimics human memory decline, gradually erasing textual information to boost the efficiency of large language models.
Since its debut, the OCR model has quickly gained over 1,400 stars on GitHub. However, some in the industry note that the company has been slow to release newer models like R2, leading to speculation that it may be trailing behind. Alternatively, it’s believed that DeepSeek is focusing on strengthening its core capabilities in preparation for its future-generation models.