VNLP: Turkish NLP Package

arXiv:2403.01309 · cs.CL, cs.AI, cs.LG · Submitted March 2, 2024

We present VNLP: the first dedicated, complete, open-source, well-documented, lightweight, production-ready, state-of-the-art NLP package for the Turkish language.

VNLP covers a wide range of tasks:

  • Sentence splitting & text normalization
  • Sentiment Analysis
  • Named Entity Recognition (NER)
  • Morphological Analysis & Disambiguation
  • Part-of-Speech (POS) Tagging

Token classification models are based on “Context Model”, a novel architecture that is both an encoder and an auto-regressive model. Ships with pre-trained word embeddings and SentencePiece Unigram tokenizers.

Available via PyPI, with Python & CLI APIs, ReadtheDocs documentation, and a live demo.

Links: arXiv · PDF · GitHub