SudachiPy is a Python version of Sudachi, a Japanese morphological analyzer used for tokenizing Japanese text.

How do I install SudachiPy?

You can install SudachiPy using pip: `pip install sudachipy`. You also need to install a dictionary: `pip install sudachidict_core`.

What are the different tokenization modes in SudachiPy?

SudachiPy offers three tokenization modes: A, B, and C. Mode A provides the finest granularity, while Mode C provides the coarsest.

How can I use a user dictionary with SudachiPy?

You can specify the path to your user dictionary in the `sudachi.json` configuration file using the `userDict` key.

Is SudachiPy actively maintained?

No, the repository was archived by the owner on Mar 9, 2023, and is now read-only.

What dictionaries are available for SudachiPy?

There are three editions of Sudachi Dictionary: small, core, and full. SudachiPy uses sudachidict_core by default. Dictionaries are installed as Python packages sudachidict_small, sudachidict_core, and sudachidict_full.

SudachiPy

SudachiPy is a Python port of the Sudachi Japanese morphological analyzer. It allows for tokenizing Japanese text with multi-granular tokenization modes (A, B, C) enabling flexible text segmentation. It provides part-of-speech tags, normalized forms, reading forms, and dictionary information for each token. SudachiPy can be used both as a command-line tool and as a Python package. It supports user dictionaries to customize the tokenization process. Core architecture involves a dictionary-based approach where the dictionary (small, core, or full) provides the morphological information.

SudachiPy

About SudachiPy

Core Capabilities

Main Tasks

Tokenization

What this tool is best suited for

Shortlist SudachiPy against top options

Key Features

Multi-granular Tokenization

User Dictionary Support

Part-of-Speech Tagging

Normalization

Reading Form Conversion

Use Cases

Sentiment Analysis of Japanese Tweets

Named Entity Recognition in Japanese News Articles

Machine Translation from Japanese to English

Japanese Text Summarization

Keyword Extraction from Japanese Documents

Quick Start Guide

Pros

Cons

Frequently Asked Questions

Reviews & Ratings

AI Verdict

Reviews

Write a Review

Free

Specs

Core Tasks

Data Interface

Analytics

Target Personas

Categories

Use SudachiPy For

Alternative Tools

Zylo

Zsh

Zopto

ZoomInfo

ZonGuru

Zola

Zipline reloaded

Zip