| Title: | Parse and Deduplicate Author Names | 
| Version: | 0.2.0 | 
| Description: | Utilities to parse authors fields from DESCRIPTION files and general purpose functions to deduplicate names in database, beyond the specific case of R package authors. | 
| License: | MIT + file LICENSE | 
| URL: | https://github.com/Bisaloo/authoritative, https://hugogruson.fr/authoritative/ | 
| BugReports: | https://github.com/Bisaloo/authoritative/issues | 
| Depends: | R (≥ 4.1.0) | 
| Imports: | stringi, utils | 
| Suggests: | knitr, rmarkdown, spelling, testthat (≥ 3.0.0) | 
| VignetteBuilder: | knitr | 
| Config/Needs/website: | epiverse-trace/epiversetheme, tidyverse, igraph, netUtils | 
| Config/testthat/edition: | 3 | 
| Config/testthat/parallel: | true | 
| Encoding: | UTF-8 | 
| Language: | en-GB | 
| LazyData: | true | 
| RoxygenNote: | 7.3.2 | 
| Config/Needs/build: | moodymudskipper/devtag | 
| NeedsCompilation: | no | 
| Packaged: | 2025-06-23 16:56:27 UTC; hugo | 
| Author: | Hugo Gruson | 
| Maintainer: | Hugo Gruson <hugo.gruson+R@normalesup.org> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-06-24 07:50:11 UTC | 
authoritative: Parse and Deduplicate Author Names
Description
Utilities to parse authors fields from DESCRIPTION files and general purpose functions to deduplicate names in database, beyond the specific case of R package authors.
Author(s)
Maintainer: Hugo Gruson hugo.gruson+R@normalesup.org (ORCID) [copyright holder]
Other contributors:
- Chris Hartgerink (ORCID) [reviewer] 
- data.org (until version 0.2.0 included) [funder] 
See Also
Useful links:
- Report bugs at https://github.com/Bisaloo/authoritative/issues 
A data.frame of historical metadata from CRAN packages epidemiology.
Description
A data.frame of historical metadata from CRAN packages epidemiology.
Usage
cran_epidemiology_packages
Format
A data.frame with 5 variables:
- Package
- package name 
- Version
- package version 
- Authors@R
- authors as listed in the - Authors@Rfield from the- DESCRIPTIONfile
- Author
- authors as listed in the - Authorfield from the- DESCRIPTIONfile
- Maintainer
- package maintainer 
Expand names from abbreviated forms or initials
Description
Expand names from abbreviated forms or initials
Usage
expand_names(short, expanded)
Arguments
| short | A character vector of potentially abbreviated names | 
| expanded | A character vector of potentially expanded names | 
Details
When you have a list xof abbreviated and non-abbreviated names and you want
to deduplicate them, this function can be used as expand_names(x, x), which
will return the most expanded version available in x for each name
Value
A character vector with the same length as short
Examples
expand_names(
  c("W A Mozart", "Wolfgang Mozart", "Wolfgang A Mozart"),
  "Wolfgang Amadeus Mozart"
)
# Real-case application example
# Deduplicate names in list, as described in "details"
epi_pkg_authors <- cran_epidemiology_packages |>
  subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |>
  parse_authors_r() |>
  # Drop email, role, ORCID and format as string rather than person object
  lapply(function(x) format(x, include = c("given", "family"))) |>
  unlist()
# With all duplicates
length(unique(epi_pkg_authors))
# Deduplicate
epi_pkg_authors_normalized <- expand_names(epi_pkg_authors, epi_pkg_authors)
length(unique(epi_pkg_authors_normalized))
Invert 'LastName FirstName' to 'FirstName LastName' (or the reverse)
Description
Invert 'LastName FirstName' to 'FirstName LastName' (or the reverse)
Usage
invert_names(names, correct_names)
Arguments
| names | A character vector of potentially inverted names | 
| correct_names | A character vector of correct names | 
Details
When you have a list x of mixed 'First Last' and 'Last First' names, but no
source of truth and you want to deduplicate them, this function can be used
as expand_names(x, x), which will return the most common version available
in x for each name.
Value
A character vector with the same length as names
Examples
invert_names(
  c("Wolfgang Mozart", "Mozart Wolfgang"),
  "Wolfgang Mozart"
)
# Real-case application example
# Deduplicate names in list, as described in "details"
epi_pkg_authors <- cran_epidemiology_packages |>
  subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |>
  parse_authors_r() |>
  # Drop email, role, ORCID and format as string rather than person object
  lapply(function(x) format(x, include = c("given", "family"))) |>
  unlist()
# With all duplicates
length(unique(epi_pkg_authors))
# Deduplicate
epi_pkg_authors_normalized <- invert_names(epi_pkg_authors, epi_pkg_authors)
length(unique(epi_pkg_authors_normalized))
Parse the Author field from a DESCRIPTION file
Description
Parse the Author field from a DESCRIPTION file into a person object
Usage
parse_authors(author_string)
Arguments
| author_string | A character containing the  | 
Value
A character vector, or a list of character vectors of length equals
to the length of author_string
Examples
# Read from a DESCRIPTION file directly
utils_description <- system.file("DESCRIPTION", package = "utils")
utils_authors <- read.dcf(utils_description, "Author")
parse_authors(utils_authors)
# Read from a database of CRAN metadata
cran_epidemiology_packages$Author |>
  parse_authors() |>
  unlist() |>
  unique() |>
  sort()
Parse the Authors@R field from a DESCRIPTION file
Description
Parse the Authors@R field from a DESCRIPTION file into a person object
Usage
parse_authors_r(authors_r_string)
Arguments
| authors_r_string | A character containing the  | 
Value
A person object, or a list of person objects of length equals
to the length of authors_r_string
Examples
# Read from a DESCRIPTION file directly
pkg_description <- system.file("DESCRIPTION", package = "authoritative")
authors_r_pkg <- read.dcf(pkg_description, "Authors@R")
parse_authors_r(authors_r_pkg)
# Read from a database of CRAN metadata
cran_epidemiology_packages |>
  subset(!is.na(`Authors@R`), `Authors@R`, drop = TRUE) |>
  parse_authors_r() |>
  head()