Overview
SudachiPy is a Python port of the Sudachi Japanese morphological analyzer. It allows for tokenizing Japanese text with multi-granular tokenization modes (A, B, C) enabling flexible text segmentation. It provides part-of-speech tags, normalized forms, reading forms, and dictionary information for each token. SudachiPy can be used both as a command-line tool and as a Python package. It supports user dictionaries to customize the tokenization process. Core architecture involves a dictionary-based approach where the dictionary (small, core, or full) provides the morphological information.
