Text-Statistics-Latin version 0.06
==================================
ABSTRACT
This module performs statistical nalisys (2) of corpora (1).
DESCRIPTION
Given a copus as input, Text::Statistics::Latin creates a seven column
CSV file as output, with one line for each token per text. Names of
input files need match the following pattern:
1 (1). txt', '1 (2). txt', ..., '1 (n).txt'
or
1 \(([1-9]|[1-9][0-9]+)\)\.txt
Columns store statistical information:
(1) number of word forms in document d;
(2) number of tokens in d;
(3) Id number of d, ie., n;
(4) frequency of term t in d;
(5) corpus frequency of t ;
(6) document frequency of t (number of documents where t occurs at
+ least once);
(7) t, UTF8 latin coded token-string delimited by C<< /[ -@]|[\[-`
+]|[{-¿]|[ɐ-˩]|[ʹ-�]/ >>
Main output file name is '1 (n + 5).txt' and it is stored in the s
+ame directory as
the corpus, together with residual files on each input file with .
+txu and .txv ad hoc extensions.
This code was written under CAPES BEX-09323-5
Example:
#!/usr/bin/perl
use strict;
use Text::CStatiBR;
&Text::CStatiBR::CSTATIBR("5"); #5 files are analised.
#Main output
#file created is
#1 (10).txt
INSTALLATION
To install this module type the following:
perl Makefile.PL
make
make test
make install
DEPENDENCIES
This module requires these other modules and libraries:
utf8
Text::ParseWords
SEE ALSO
http://search.cpan.org/~ambs/
http://search.cpan.org/~tpederse/
REFERENCES
(1) BERBER-SARDINHA, Tony. Linguistica de Corpus. Manole, 2004
(2) http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
COPYRIGHT AND LICENCE
Copyright (C) 2007 by Rodrigo Panchiniak Fernandes
This code was written under CAPES BEX-09323-5
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.