OCR and redaction with Qwen 3.5

March 2026

Overview

The ‘doc_redaction’ project is an open source GUI and CLI application for OCR and redaction tasks for PDF documents, images, and tabular data. The code for the application can be found here. Initially the app was based on ‘traditional’ OCR such as Tesseract and PaddleOCR, with spaCy for entity recognition, alongside API calls to AWS services such as Textract and Comprehend.

Recently the app has incorporated local vision language models (VLMs) for OCR and redaction tasks with on-device deployment. The particular difficulty of the document redaction task compared to most OCR uses is that the specific bounding box location of words is essential for successful redaction, meaning that until now, VLMs have struggled to be useful in this context. Improvements in the past year in both closed and open source VLM models have changed the situation.

In February 2026, I wrote an article looking into the use of VLMs for OCR and redaction tasks in documents. The original article, using Qwen 3 VL 8B Instruct, can be found here. In late February 2026, Qwen 3.5 was released and I wanted to see how it compared to Qwen 3 for OCR/redaction tasks. Here I will test Qwen 3.5 for OCR/redaction on three ‘difficult’ tasks with an updated version of the app (version 2.0.1, which you can test out for yourself here).

We will be trying the models with the following tasks:

OCR of a difficult handwritten note
Face detection (with bounding box locations) on a document page
Custom entity identification on open text (using LLMs)

My conclusion from the previous article was that PaddleOCR for initial OCR, paired with Qwen 3 VL 8B Instruct for low confidence phrases was the best solution for OCR of ‘difficult’ pages in documents (e.g. with difficult handwriting). In this post, I will use the following models (along with instructions on how you can try them yourself):

Qwen 3.5 9b 4 bit quantised, deployed via vLLM.
Qwen 3.5 35B A3B 4 bit quantised, deployed via llama.cpp.
Qwen 3.5 27B 4 bit quantised, hosted on the Document Redaction VLM space on Hugging Face here. You can try this model directly with the examples below at this URL without deploying your own server.

Deploying Qwen 3.5 35B A3B 4 bit quantised (llama.cpp)

For llama.cpp, use the Docker Compose file in the main doc_redaction repository here. Use command ‘docker compose -f docker-compose_llama.yml –profile 27b up -d’. Recommended VRAM is 24GB, but can be adjusted downwards with the ‘–n-gpu-layers’ or ‘–n-cpu-moe’ parameters (as relevant)in the docker compose file. See the Unsloth guide to Qwen 3.5 deployment for more details.

Deploying Qwen 3.5 9B 4 bit quantised (vLLM)

Deploy with the docker compose file here. Use command ‘docker compose -f docker-compose_vllm.yml –profile vllm-9b up -d’, needs 16-17GB GB VRAM. Further instructions for use of vLLM can be found here. For less VRAM usage, you can deploy the 9B model also using an Unsloth GGUF file, similar to the approach for the 35B model above.

Example 1: Difficult handwritten note

To try the Qwen 3.5 27B model with this example, at the Hugging Face space, click the third example name, ‘Unclear text on handwritten note’, change the local OCR model option below to ‘vlm’. Then click on ‘Redact document’.

The first example is a note with handwriting that is hard to decipher even for a person.

Qwen 3 VL 8B Instruct

Qwen 3 VL 8B Instruct alone

As a reminder, Qwen 3 VL 8B Instruct alone found this:

Unclear text on handwritten note analysed with Qwen 3 VL

Rating: 7/10 - the text identification is pretty good, but the model ignored the two text lines on the left (7/10). Bounding boxes are generally accurate, but some do not cover the entire line (7/10).

Hybrid PaddleOCR + Qwen 3 VL

A hybrid approach of PaddleOCR + Qwen 3 VL 8B Instruct found this:

Rating: 7.75/10 - the text identification is a bit worse than the VLM model alone (7.5/10). This is improved by the fact that due to the hybrid approach, all the bounding boxes have been identified, and the text is generally correctly (8/10).

My conclusion from the last article was that PaddleOCR for initial OCR, paired with Qwen 3 VL 8B Instruct for low confidence phrases was the best solution for ‘difficult’ pages in documents. This was mainly due to the ‘laziness’ of the VLM model to identify text in the document - note that Qwen 3 VL 8B Instruct ignored the two text lines on the left. On pages with lots of text, I find that this pattern is repeated - the VLM will tend to miss some text.

Qwen 3.5 9B (vLLM)

VLM alone

Rating: 8/10 - the text identification is generally good, and it has included the two text lines on the left that were missed by Qwen 3 VL 8B Instruct, however and the text is not fully correct (8/10). For bounding boxes, it did miss one text box in the middle of the page (8/10).

Hybrid PaddleOCR + Qwen 3.5 9B

Rating: 7.75/10 - The bounding boxes are all located correctly (8/10), but the text identification is not that great - particularly noting the Cyrillic characters identified near the bottom of the page (7.5/10).

Qwen 3.5 35B A3B (llama.cpp)

VLM alone

Rating: 8/10 - the text identification is generally good, but I would say it slightly worse than the 9B model (8/10). No text boxes have been missed, however some boxes seem too large and overlap their neighbours (8/10).

Hybrid PaddleOCR + Qwen 3.5 35B A3B

Rating: 8.25/10 - The text identification is slightly better than for the 9B model in the hybrid approach, but still not perfect (particularly near the bottom) (8/10). The improvement in text identification results also in a better word match to bounding boxes (8.5/10).

Qwen 3.5 27B (HF space)

Qwen 3.5 is served on the Hugging Face space here. Click the third example name, ‘Unclear text on handwritten note’, change the local OCR model option below to ‘vlm’. Then click on ‘Redact document’.

VLM alone

Rating: 8.5/10 - The output as generally very good. This time, the VLM model has identified all the text in the document (8.5/10). The position of the bounding boxes are also generally correct, with less overlap between lines than seen with the 35B A3B model (8.5/10). I have not given a higher score as I have seen with other quants (namely the llama.cpp deployment of the model that you can run here), the model does seem to miss a line or two of text. The unreliability/laziness of the model is still an issue.

Hybrid PaddleOCR + Qwen 3.5 27B

Rating: 7.75/10 - The text identification is noticeably worse than using the VLM alone (Cyrillic characters identified near the bottom) (7.5/10). The bounding boxes are also generally worse, with more overlap between lines than seen with the VLM alone, and some very small word boxes are present (8/10).

Conclusion

Model	Text Identification	Bounding Boxes	Overall rating
Qwen 3 VL 8B Instruct	7/10	7/10	7/10
Hybrid PaddleOCR + Qwen 3 VL	7.5/10	8/10	7.75/10
Qwen 3.5 9B	8/10	8/10	8/10
Hybrid PaddleOCR + Qwen 3.5 9B	7.5/10	8/10	7.75/10
Qwen 3.5 35B A3B	8/10	8/10	8/10
Hybrid PaddleOCR + Qwen 3.5 35B A3B	8/10	8.5/10	8.25/10
Qwen 3.5 27B	8.5/10	8.5/10	8.5/10
Hybrid PaddleOCR + Qwen 3.5 27B	7.5/10	8/10	7.75/10

Overall, the Qwen 3.5 27B model alone (i.e. not using the hybrid approach with PaddleOCR) performs best on this task for identifying difficult handwriting. However, the issue with model ‘laziness’ in terms of missing lines of text in its response still persists, preventing me giving it a near perfect score.

Example 2: Face identification

The next task is to accurately identify the location of people’s faces on a document. The document can be found here.

Since the pages contain a lot of typed text, pure VLM analysis is not necessary, and would likely worse due to the general VLM ‘laziness’ in terms of missing lines of text in their response. So page OCR is conducted with a hybrid PaddleOCR + VLM model approach (for any low confidence lines). Afterwards, a second VLM pass looks specifically for photos of people’s faces present on the page, and creates a bounding box for each face.

The example page contains two photos of faces, one on the left side of the page, and one on the right side, and also two cartoon drawings of peoples. So this test also tests the VLM’s ability to following instructions to distinguish between photos of faces and cartoon drawings, as well as locating them.