Welcome to rfishbase 5! This is the fourth rewrite of
the original rfishbase package described in Boettiger et
al. (2012).
Another streamlined re-design following new abilities for data hosting and access. This release relies on a HuggingFace datasets hosting for data and metadata hosting in parquet and schema.org.
Data access is simplified to use the simple HuggingFace datasets API instead of the previous contentid-based resolution. This allows metadata to be defined with directly alongside the data platform independent of the R package.
A simplified access protocol relies on duckdbfs for
direct reads of tables. Several functions previously used only to manage
connections are now deprecated or removed, along with a significant
number of dependencies.
Core use still centers around the same package API using the
fb_tbl() function, with legacy helper functions for common
tables like species() are still accessible and can still
optionally filter by species name where appropriate. As before, loading
the full tables and sub-setting manually is still recommended.
Historic helper functions like load_taxa() (combining
the taxonomic classification from Species, Genus, Family and Order
tables), validate_names(), and common_to_sci()
and sci_to_common() should be in working order, all using
table-based outputs.
rfishbase 1.0 relied on parsing of XML pages served
directly from Fishbase.org.rfishbase 2.0 relied on calls to a ruby-based API,
fishbaseapi, that provided access to SQL snapshots of about
20 of the more popular tables in FishBase or SeaLifeBase.rfishbase 3.0 side-stepped the API by making queries
which directly downloaded compressed csv tables from a static web host.
This substantially improved performance a reliability, particularly for
large queries. The release largely remained backwards compatible with
2.0, and added more tables.rfishbase 4.0 extends the static model and interface.
Static tables are distributed in parquet and accessed through a
provenance-based identifier. While old functions are retained, a new
interface is introduced to provide easy access to all fishbase
tables.We welcome any feedback, issues or questions that users may encounter through our issues tracker on GitHub: https://github.com/ropensci/rfishbase/issues
remotes::install_github("ropensci/rfishbase")library("rfishbase")
library("dplyr") # convenient but not requiredAll fishbase tables can be accessed by name using the
fb_tbl() function:
fb_tbl("ecosystem")# A tibble: 160,334 × 18
   autoctr E_CODE EcosystemRefno Speccode Stockcode Status CurrentPresence
     <int>  <int>          <int>    <int>     <int> <chr>  <chr>          
 1       1      1          50628      549       565 native Present        
 2       2      1            189      552       568 native Present        
 3       3      1            189      554       570 native Present        
 4       4      1          79732      873       889 native Present        
 5       5      1           5217      948       964 native Present        
 6       7      1          39852      956       972 native Present        
 7       8      1          39852      957       973 native Present        
 8       9      1          39852      958       974 native Present        
 9      10      1            188     1526      1719 native Present        
10      11      1            188     1626      1819 native Present        
# ℹ 160,324 more rows
# ℹ 11 more variables: Abundance <chr>, LifeStage <chr>, Remarks <chr>,
#   Entered <int>, Dateentered <dttm>, Modified <int>, Datemodified <dttm>,
#   Expert <int>, Datechecked <dttm>, WebURL <chr>, TS <dttm>You can see all the tables using fb_tables() to see a
list of all the table names (specify sealifebase if
desired). Careful, there are a lot of them! The fishbase databases have
grown a lot in the decades, and were not intended to be used directly by
most end-users, so you may have considerable work to determine what’s
what. Keep in mind that many variables can be estimated in different
ways (e.g. trophic level), and thus may report different values in
different tables. Also note that species is name (or SpecCode) is not
always the primary key for a table – many tables are specific to stocks
or even individual samples, and some tables are reference lists that are
not species focused at all, but meant to be joined to other tables
(faoareas, etc). Compare tables against what you see on
fishbase.org, or ask on our issues forum for advice!
fish <- c("Oreochromis niloticus", "Salmo trutta")
fb_tbl("species") %>% 
  mutate(sci_name = paste(Genus, Species)) %>%
  filter(sci_name %in% fish) %>% 
  select(sci_name, FBname, Length)# A tibble: 2 × 3
  sci_name              FBname       Length
  <chr>                 <chr>         <dbl>
1 Oreochromis niloticus Nile tilapia     60
2 Salmo trutta          Sea trout       140In most tables, species are identified by SpecCode (as
per best practices) rather than scientific names. Multiple tables can be
joined on the SpecCode to more fully describe a
species.
To filter species by taxonomic names, use the taxa table from
load_taxa(), which provides a joined table of taxonomy from
subspecies up through Class, along with the corresponding FishBase taxon
ids codes. Here is an example workflow joining two of the spawning
tables and filtering to the grouper family, Epinephelidae:
library(rfishbase)
library(dplyr)
## Get the whole spawning and spawn agg table, joined together:
spawn <- left_join(fb_tbl("spawning"),  
                   fb_tbl("spawnagg"), 
                   relationship = "many-to-many")
# Filter taxa down to the desired species
groupers <- load_taxa() |> filter(Family == "Epinephelidae")
## A "filtering join" (inner join) 
spawn |> inner_join(groupers)# A tibble: 227 × 95
   autoctr StockCode SpecCode SpawningRefNo SourceRef C_Code E_CODE
     <int>     <int>    <int>         <int>     <int> <chr>   <int>
 1      18        18       12          5222      3092 528A       NA
 2      19        18       12         26409      1784 388       145
 3      20        20       14         26409        NA 192        NA
 4    9147        20       14        118249    118249 826E        8
 5      22        21       15          5241      5241 630        NA
 6      23        21       15          5241      6484 388        NA
 7      24        21       15          5241      3095 060        NA
 8      24        21       15          5241      3095 060        NA
 9      24        21       15          5241      3095 060        NA
10      24        21       15          5241      3095 060        NA
# ℹ 217 more rows
# ℹ 88 more variables: SpawningGround <chr>, Spawningarea <chr>, Jan <dbl>,
#   Feb <dbl>, Mar <dbl>, Apr <dbl>, May <dbl>, Jun <dbl>, Jul <dbl>,
#   Aug <dbl>, Sep <dbl>, Oct <dbl>, Nov <dbl>, Dec <dbl>, GSI <int>,
#   PercentFemales <int>, TempLow <dbl>, TempHigh <dbl>, SexRatiomid <dbl>,
#   SexRmodRef <int>, FecundityMin <int>, WeightMin <dbl>,
#   LengthFecunMin <dbl>, LengthTypeFecMin <chr>, FecundityRef <int>, …Always keep in mind that taxonomy is a dynamic concept. Species can be split or lumped based on new evidence, and naming authorities can disagree over which name is an ‘accepted name’ or ‘synonym’ for any given species. When providing your own list of species names, consider first checking that those names are “valid” in the current taxonomy established by FishBase:
validate_names("Abramites ternetzi")[1] "Abramites hypselonotus"rfishbase can also provide tables of
synonyms(), a table of common_names() in
multiple languages, and convert common_to_sci() or
sci_to_common()
common_to_sci(c("Bicolor cleaner wrasse", "humphead parrotfish"), Language="English")# A tibble: 5 × 4
  Species                ComName                     Language SpecCode
  <chr>                  <chr>                       <chr>       <int>
1 Labroides bicolor      Bicolor cleaner wrasse      English      5650
2 Chlorurus cyanescens   Blue humphead parrotfish    English      7909
3 Bolbometopon muricatum Green humphead parrotfish   English      5537
4 Bolbometopon muricatum Humphead parrotfish         English      5537
5 Chlorurus oedema       Uniform humphead parrotfish English      8394Note that the results are returned as a table, potentially indicating other common names for the same species, as well as potentially different species that match the provided common name! Please always be careful with names, and use unique SpecCodes to refer to unique species.
SeaLifeBase.org is maintained by the same organization and largely
parallels the database structure of Fishbase. As such, almost all
rfishbase functions can instead be instructed to address
the
fb_tbl("species", "sealifebase")# A tibble: 102,464 × 111
   SpecCode Genus   Species Author SpeciesRefNo FBname FamCode Subfamily GenCode
      <int> <chr>   <chr>   <chr>         <int> <chr>    <int> <chr>       <int>
 1    57969 Abdopus horrid… (D'Or…        96968 Red S…    1890 Octopodi…   24384
 2    57836 Abdopus tenebr… (Smit…           19 <NA>      1890 Octopodi…   24384
 3    57142 Abdopus tongan… (Hoyl…           19 <NA>      1890 Octopodi…   24384
 4  2381155 Abdopus undula… Huffa…        84307 <NA>      1890 <NA>        24384
 5    14647 Abebai… troglo… Vande…           19 <NA>       572 <NA>         9260
 6   165283 Aberom… muranoi Baces…       104101 <NA>       616 <NA>        33537
 7   140720 Aberra… banyul… Macki…        85340 <NA>       174 <NA>         9262
 8    40346 Aberra… enigma… unspe…           19 <NA>       174 <NA>         9262
 9    20199 Aberra… aberra… (Barn…           19 <NA>       308 <NA>         9263
10    93706 Aberro… verruc… Kasat…         3696 <NA>       922 <NA>        17969
# ℹ 102,454 more rows
# ℹ 102 more variables: TaxIssue <int>, Remark <chr>, PicPreferredName <chr>,
#   PicPreferredNameM <chr>, PicPreferredNameF <chr>, PicPreferredNameJ <chr>,
#   Source <chr>, AuthorRef <int>, SubGenCode <int>, Fresh <int>, Brack <int>,
#   Saltwater <int>, Land <int>, BodyShapeI <chr>, DemersPelag <chr>,
#   Amphibious <chr>, AmphibiousRef <int>, AnaCat <chr>, MigratRef <int>,
#   DepthRangeShallow <int>, DepthRangeDeep <int>, DepthRangeRef <int>, …By default, tables are downloaded the first time they are used.
rfishbase defaults to download the latest available
snapshot; be aware that the most recent snapshot may be months behind
the latest data on fishbase.org. Check available releases:
available_releases()[1] "19.04" "21.06" "23.01" "23.05" "24.07"Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.