ub-MOJI logo ub-MOJI | AI Vision Lab

ub-MOJI: A Japanese Fingerspelling Video Dataset

ub-MOJI is a Japanese fingerspelling video dataset designed to advance research in sign language recognition

Dataset Demo

ub-MOJI Features

Japanese Fingerspelling Coverage

Supports syllables, 5-character sequences, and full words

Temporal Annotation

Precise start-end timing for every fingerspelled unit

Rich Participant Metadata

Includes detailed demographic and consent metadata for participant-aware modeling and analysis.

Academic-Only License

Available under terms restricting use to non-commercial academic research

Metadata Overview

An overview of the basic information and participant information included in this dataset. Provided as CSV files.

metadata.csv

Basic information such as the video file path, category, and recording conditions.

Field Name Type Description
file_name str File path of the video sample
classes List[str] Fingerspelled unit (e.g., `["a"]`, `["ka", "ma", "ku", "ra"]`)
category int Linguistic unit category: `0=syllable`, `1=sequence`, or `2=word`
participant_id int Participant identifier (e.g., `18`)
recording_date int Year and month of recording (e.g., `202403`)
fps int Frames per second (e.g., `30`)

participants.csv

Anonymized attribute information of participants who cooperated in data collection.

Field Name Type Description
participant_id int Participant identifier (e.g., `18`)
age_group str Age decade group (e.g., `40` for age 40-49; `-1` if not provided)
gender int Gender category: `0=female`, `1=male`, `-1` if unspecified
dominant_hand int Dominant hand: `0=right`, `1=left`, `-1` if unspecified
experience_years str Years of sign language experience: one of `1-3`, `4-6`, ..., `51+` or `-1`
hearing_level int Self-reported hearing ability: `0` (no issue) to `4` (severe), or `-1`(unknown)`
face_visibility int Face visibility consent: `1=agreed`, `0=declined`

Temporal Annotation

annotations.toml

ub-MOJI supports temporal action detection tasks. It provides annotations in TOML files indicating the start and end positions of finger-spelling classes for each video sample. Each top-level TOML table represents a single video identified by a unique video ID. All annotations were manually performed by authors or contributors.

Authors

Citation

@misc{ubmoji2025,
  title     = {ub-MOJI},
  author    = {Tamon Kondo and Ryota Murai and Naoto Tsuta and Yousun Kang},
  year      = {2025},
  url       = {https://huggingface.co/datasets/kanglabs/ub-MOJI},
  publisher = {Hugging Face}
}
@inproceedings{Murai2025pointSupervisedJF,
  title     = {Point-Supervised Japanese Fingerspelling Localization via HR-Pro and Contrastive Learning},
  author    = {Ryota Murai and Naoto Tsuta and Duk Shin and Yousun Kang},
  booktitle = {Proceedings of 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year      = {2025},
}