PenParse is a hand-writing OCR tool that can transcribe photos and scans of hand-written notes to markdown-formatted documents and synchronise their content with PKM packages.
Like all documents in my digital garden, this is a living note which could change over time and is definitely not “finished”
Goals
The primary goals of the project are:
- Provide free or cheap hand writing recognition capabilities with easy-to-use integration into common notes applications including Obsidian, Joplin, Memos and others.
- Provide a simple, easy-to-grok, low-friction user experience by supporting image uploads directly via web browser or via common chat apps like Telegram and Discord.
- Provide a privacy-centric, self-hostable open source service and a flagship service that offers users a dignified service and respects their data privacy for a very modest fee (to cover hosting cost, dev time and donate to other FOSS projects).
- Provide a respectful service that keeps data private and deletes things once they’re no longer needed.
Secondary goal: assemble an entirely optional “opt-in” OCR dataset which can be used to fine tune and further enhance local-first handwriting detection capability.
Why Build PenParse?
- I like to combine digital PKM with manual writing. Writing without a computer can be a useful outlet for me without having to look at a screen. I can do it without my phone which means I don’t need to worry about being distracted by notifications. There’s more friction to multi-tasking (I’d have to go fetch my phone or laptop and unlock them).
- The technology is ready, we can use local models to process images without having to send them to data-hungry companies. The task can be achieved well enough that it’s useful but it’s still low-stakes enough that errors can be corrected in post-processing without any major issues.
- There seems to be demand for it and Joplin are doing something similar.
Licensing
The project itself is copyleft open source (AGPL) and self-hostable but a flagship instance of the software will be provided with very affordable membership options for users who are not able or do not want to host the software themselves.
If enough people opt in to build a dataset, this should be published using a permissive creative commons style license.
Data Protection and Usage
Core Application
The tool works by allowing users to upload photos of hand written notes for analysis by an OCR model and responding with a text document. These images and corresponding documents contain sensitive information. Therefore we need to be very careful with them only use them for the stated purpose of extracting text from the image and sending it to the user’s personal knowledge silo.
- Users must not be able to access each other’s records or images directly or indirectly.
- Once processing is complete, data should be wiped from the server after a reasonably short period (let’s say 7 days). Users will have a full copy of the scanned note in their local PKM anyway.
- For self-hosters, the need to partition access between users is (likely) diminished although small coops or groups of friends using the same hosted instance cannot be discounted.
Data and Licensing Considerations for the OCR Model
Choice of model is deliberately decoupled from the app itself. Any OpenAI-style REST api is supported. This allows the user the freedom to make decisions about which model they want to use and consider the pros and cons of their decision.
Users may opt to send their data to Anthropic Claude or GPT-4 but should understand that doing so involves sending their notes and the contents of those notes to an AI company over the open internet (albeit with TLS encryption). Alternatively models like Qwen2-VL are now able to reliably perform handwriting recognition on a consumer GPU and can be feasibly self-hosted for a relatively low cost.
Currently there are no “fully open” multi-modal models that can perform well at this task so it is impossible to disentangle ethical concerns about
Public Handwriting OCR Dataset
A secondary goal of this project is to build a public dataset of handwriting OCR images and corresponding outputs.
- No dark patterns. Contributing to this dataset should be entirely opt-in and should be off by default to avoid users unintentionally submitting sensitive content. A warning should be visible when the user turns this on.
- Submitted content should be manually reviewed before it is added to the dataset. Initially reviewers may be volunteers but aspirationally we’d like to pay them from the proceeds of core app subscriptions.
- Being added to a dataset is quite permanent. Of course if someone requests that their data be purged, we could do our best to find and remove it but an older dataset may have already been downloaded by other users. Therefore it seems reasonable that we would have a waiting period of the order of days - before examples that have been approved for use by an application user are actually published to the dataset.
- Published data also represents a GDPR risk (e.g. a user made notes about their meeting and wrote down the full names of the attendees) so some sort of filtering/cleanup needs to be done here.
Architecture
- PenParse is designed as a django web application that uses Celery to communicate with a worker process which carries out the OCR workflow asynchronously.
- Users can interact with PenParse directly via the web UI or via the plugin API from inside their PKM tool.
- Uploaded images are stored on the filesystem. User data is stored in a SQLite database.
- The worker supports OpenAI-compatible REST APIs such as vllm, ollama or external model providers.
graph TB
subgraph Client
WebUI[Web UI]
PluginAPI[Plugin API]
end
subgraph PenParse
Django[Django Server]
Celery[Celery]
Worker[Worker]
SQLite[(SQLite Database)]
FileSystem[(File System)]
end
subgraph ExternalServices
VLLM[VLLM Server]
end
WebUI --> Django
PluginAPI --> Django
Django --> SQLite
Django --> FileSystem
Django <--> Celery
Celery <--> Worker
Worker --> VLLM
Worker --> SQLite
classDef server fill:#f9f,stroke:#333,stroke-width:2px;
classDef storage fill:#bbf,stroke:#333,stroke-width:2px;
classDef external fill:#bfb,stroke:#333,stroke-width:2px;
class Django,Celery,Worker server;
class SQLite,FileSystem storage;
class VLLM external;
Architectural Decisions
The application was designed as a web service to make it easier to integrate into a number of different chat services and PKM applications without a significant amount of engineering overhead to rewrite and re-engineer. For example, a paradigm that was considered was just embedding the whole app as an Obsidian plugin. However, this might make integration with Telegram bots or Joplin significantly more difficult and also prevents users using the service if they only have access to a mobile phone and their laptop or workstation is off at home.