tokenizers: Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for
    shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs,
    characters, shingled characters, lines, Penn Treebank, regular
    expressions, as well as functions for counting characters, words, and sentences,
    and a function for splitting longer texts into separate documents, each with
    the same number of words.  The tokenizers have a consistent interface, and
    the package is built on the 'stringi' and 'Rcpp' packages for  fast
    yet correct tokenization in 'UTF-8'. 
| Version: | 0.3.0 | 
| Depends: | R (≥ 3.1.3) | 
| Imports: | stringi (≥ 1.0.1), Rcpp (≥ 0.12.3), SnowballC (≥ 0.5.1) | 
| LinkingTo: | Rcpp | 
| Suggests: | covr, knitr, rmarkdown, stopwords (≥ 0.9.0), testthat | 
| Published: | 2022-12-22 | 
| DOI: | 10.32614/CRAN.package.tokenizers | 
| Author: | Lincoln Mullen  [aut, cre],
  Os Keyes  [ctb],
  Dmitriy Selivanov [ctb],
  Jeffrey Arnold  [ctb],
  Kenneth Benoit  [ctb] | 
| Maintainer: | Lincoln Mullen  <lincoln at lincolnmullen.com> | 
| BugReports: | https://github.com/ropensci/tokenizers/issues | 
| License: | MIT + file LICENSE | 
| URL: | https://docs.ropensci.org/tokenizers/,
https://github.com/ropensci/tokenizers | 
| NeedsCompilation: | yes | 
| Citation: | tokenizers citation info | 
| Materials: | README, NEWS | 
| In views: | NaturalLanguageProcessing | 
| CRAN checks: | tokenizers results | 
Documentation:
Downloads:
Reverse dependencies:
| Reverse imports: | blocking, covfefe, deeplr, DeepPINCS, DramaAnalysis, pdfsearch, proustr, rslp, textrecipes, tidypmc, tidytext, ttgsea, wactor, WhatsR | 
| Reverse suggests: | edgarWebR, sumup, torchdatasets | 
| Reverse enhances: | quanteda | 
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=tokenizers
to link to this page.