| Title: | A Naive IPA Tokeniser | 
| Version: | 0.1.0 | 
| Date: | 2025-02-23 | 
| Description: | It provides users with functions to parse International Phonetic Alphabet (IPA) transcriptions into individual phones (tokenisation) based on default IPA symbols and optional user specified multi-character phones. The tokenised transcriptions can be used for obtaining counts of phones or for searching for words matching phonetic patterns. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.3.2 | 
| Imports: | cli, dplyr, lifecycle, magrittr, stringi, stringr, tibble, Unicode | 
| Depends: | R (≥ 2.10) | 
| Suggests: | rmarkdown, knitr, tidyverse | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/stefanocoretta/phonetisr, https://stefanocoretta.github.io/phonetisr/ | 
| NeedsCompilation: | no | 
| Packaged: | 2025-02-25 13:39:25 UTC; ste | 
| Author: | Stefano Coretta | 
| Maintainer: | Stefano Coretta <stefano.coretta@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-02-26 13:10:02 UTC | 
phonetisr: A Naive IPA Tokeniser
Description
It provides users with functions to parse International Phonetic Alphabet (IPA) transcriptions into individual phones (tokenisation) based on default IPA symbols and optional user specified multi-character phones. The tokenised transcriptions can be used for obtaining counts of phones or for searching for words matching phonetic patterns.
Author(s)
Maintainer: Stefano Coretta stefano.coretta@gmail.com (ORCID)
See Also
Useful links:
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
| lhs | A value or the magrittr placeholder. | 
| rhs | A function call using the magrittr semantics. | 
Value
The result of calling rhs(lhs).
Add features to list of phones
Description
This function counts occurrences of phones and includes basic phonetic features.
Usage
featurise(phlist)
Arguments
| phlist | A list of phones or the output of  | 
Value
A tibble.
Examples
ipa <- c("ada", "buba", "kiki", "sa\u0283a")
ip_ph <- phonetise(ipa)
featurise(ip_ph)
Get non-IPA characters.
Description
Given a vector of characters, it returns those which are not part of the IPA.
Usage
get_no_ipa(chars)
Arguments
| chars | A vector of characters. | 
Value
A vector.
Examples
get_no_ipa(c("a", "\0283", ">"))
List of IPA symbols
Description
List of IPA symbols
Usage
ipa_symbols
Format
A data frame with 143 rows and 12 variables:
- IPA
- IPA symbol. 
- unicode
- Unicode code. 
- uni_name
- Unicode name. 
- ipa_name
- IPA name. 
- phon_type
- The phonetic type of the symbol. 
- type
- General character type ( - consonant,- vowel,- diacritic).
- height_ipa
- Vowel openness. 
- height
- Vowel height. 
- backness
- Vowel backness. 
- rounding
- Vowel rounding. 
- voicing
- Consonant voicing. 
- place
- Consonant place of articulation. 
- manner
- Consonant manner of articulation. 
- lateral
- Is the consonant lateral? 
- sonorant
- Is the phone sonorant? 
Klingon Swadesh list
Description
The Swadesh list in Klingon.
Usage
kl_swadesh
Format
A data frame with 195 rows and 4 variables:
- id
- Swadesh list item number. 
- gloss
- English gloss. 
- translit
- Klingon transliteration. 
- ipa
- IPA transcription. 
Search phones
Description
Given a vector of phonetised strings, find phones.
Usage
ph_search(phlist, phonex)
Arguments
| phlist | The output of  | 
| phonex | A phonetic expression. Supported shorthands are  | 
Value
A list.
Examples
ipa <- c("p\u02B0a\u0303k\u02B0", "t\u02B0um\u0325", "\u025Bk\u02B0\u026F", "pun")
ph <- c("p\u02B0", "t\u02B0", "k\u02B0", "a\u0303", "m\u0325")
ipa_ph <- phonetise(ipa, multi = ph)
ph_search(ipa_ph, "#CV")
# partial matches are also returned
ph_search(ipa_ph, "p")
# use regular expressions
ph_search(ipa_ph, "p\u02B0?V")
Tokenise IPA strings
Description
phonetise() tokenises strings of IPA symbols (like phonetic transcriptions
of words) into individual "phones". The output is a list.
Usage
phonetise(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)
phonetize(
  strings,
  multi = NULL,
  regex = NULL,
  split = TRUE,
  sep = " ",
  sanitise = TRUE,
  ignore_stress = TRUE,
  ignore_tone = TRUE,
  diacritics = FALSE,
  affricates = FALSE,
  v_sequences = FALSE,
  prenasalised = FALSE,
  all_multi = FALSE,
  sanitize = sanitise
)
Arguments
| strings | A character vector with a list of words in IPA. | 
| multi | A character vector of one or more multi-character phones as strings. | 
| regex | A string with a regular expression to match several multi-character phones. | 
| split | If set to  | 
| sep | A character to be used as the separator of the phones if  | 
| sanitise | Whether to remove all non-IPA characters ( | 
| ignore_stress | If  | 
| ignore_tone | If  | 
| diacritics | If set to  | 
| affricates | If set to  | 
| v_sequences | If set to  | 
| prenasalised | If set to  | 
| all_multi | If set to  | 
| sanitize | Alias of  | 
Value
A list of phonetised strings.
Examples
# using unicode escapes for CRAN policy
ipa <- c("p\u02B0a\u0303k\u02B0", "t\u02B0um\u0325", "\u025Bk\u02B0\u026F")
ph <- c("p\u02B0", "t\u02B0", "k\u02B0", "a\u0303", "m\u0325")
phonetise(ipa, multi = ph)
ph_2 <- ph[4:5]
# Match any character followed by <\u02B0> with ".\u02B0".
phonetise(ipa, multi = ph_2, regex = ".\u02B0")
# Same result.
phonetise(ipa, regex = ".(\u0303|\u0325|\u02B0)")
# Don't split strings and use "." as separator
phonetise(ipa, multi = ph, split = FALSE, sep = ".")