website

SpaCy is an open source multi-function NLP pipeline and framework.

SpaCy “Memory Leak”

Long running spaCy processes appear to have a memory leak and may use an increasing amount of RAM over time. According to this discussion the issue is due to the vocabulary map growing as the model encounters new words that were not in its training set and this is an intentional behaviour. The authors recommend restarting processes that use spacy regularly if this is a problem.

If serving spacy using fastapi we can use [[gunicorn#refreshing-workers-after-n-requests|gunicorn max_requests ]] to prevent spacy from running out of memory.