Source strings and zh-CN translate resources of Telegram
-
Updated
Dec 13, 2020 - Python
Source strings and zh-CN translate resources of Telegram
Collect tweets (tweets corpus) using Twitter API. Collection can be based on hashtags, keywords, geographical location
BERT models with tokenization for Japanese texts.
💬 Cross-platform application for the creation of language resources from ELAN linguistic analysis files, or from scratch.
Build scripts for the UniSegments collection of morphologically segmented lexicons for many languages
The scripts for compiling the Universal Derivations collections of harmonised word-formation resources for multiple langugaes.
This is a web application that will serve to be the community-driven go-to site for finding Chinese resources and learning Mandarin.
Script for simplifying the process of translating MineOS Language (.lang) files
Selected data processing scripts including language agnostic multilingual wiktionary parser
Converts the Folkets Lexikon Swedish-English dictionary into an optimized JSON format with enhanced structure and metadata. Provides clean, programmatically accessible dictionary data for developers and language learning applications.
A paradigm-based morphological dictionary of the Ukrainian language. Built as a reproducible NLP resource, it provides structured source data and a generator script to produce fully tagged word forms for verbs, nouns, adjectives, and participles.
Estonian Grammatical Error Correction (GEC) test and development corpus that contains L2 learner texts error-annotated in the M2 format.
Open Pashto language resources for ASR, TTS, NLP, and machine translation
Español: cree un conjunto de datos de tarjetas flash a partir de un archivo .txt. Palabra, significado, etimología, ejemplos, clase. English: create Flash Cards dataset from a .txt file. Word, meaning, etymology, examples, class.
Function Words in Philippine Languages
Simple parser for Dict.cc dictionary
Gets text and extracts sentences in a language from text using that language's lexicon.
This repo contaings PDF, text and manually edited files of ka Dienshonhia dictionary digitalisation work
Add a description, image, and links to the language-resources topic page so that developers can more easily learn about it.
To associate your repository with the language-resources topic, visit your repo's landing page and select "manage topics."