bibutils_4.12/ 0000775 0000764 0000764 00000000000 11475753014 013570 5 ustar cdputnam cdputnam bibutils_4.12/bibutils.dbk 0000644 0000764 0000764 00000051072 11444072056 016066 0 ustar cdputnam cdputnam
David">
Bremner">
February 10, 2008">
1">
bremner@unb.ca">
BIBUTILS">
Debian">
GNU">
GPL">
]>
&dhemail;
2008
&dhusername; (Manual)
2008
Christopher Putnam (Software and Manual)
This manual page is distributed under the terms of version 2 of
the GNU General Public License.
&dhdate;
&dhucpackage; &dhsection;
&dhpackage;
&version;
User Commands
bibutils
bibliography conversion utilities
format2xml
OPTIONS
file.format
xml2format
OPTIONS
file.xml
DESCRIPTION
The bibutils program set inter-converts between various
bibliography formats using
Library of Congress's
Metadata Object
Description Schema (MODS)version 3.1. For example,
one can convert RIS-format files to Bibtex by doing two
transformations: RIS->MODS->Bibtex.
Converting to MODS
Overview
Command
Description
convert bibtex to MODS
convert
COPACformat
references to MODS
convert EndNote (Refer format) to MODS
convert EndNote XML to MODS
convert ISI web of science to MODS
med2xml
convert Pubmed XML references to MODS
modscleana MODS to MODS converter
convert RIS format to MODS
Common Options Converting to MODS
Several flags available for the end2xml, endx2xml,
bib2xml, ris2xml, med2xml, and copac2xml programs. Most options have both
a short and a long version.
-h
--help
display help
-v--version display version
-a--add-refcount add "_#", where # is reference count to reference id
-s--single-refperfile put one reference per file name by the reference number
-i
--input-encoding
interpret the input file as using the requested character set
(use w/o argument for current list derived from character sets
at
www.kostis.net)
unicode is now a character set option
-u
--unicode-characters
encode unicode characters directly in the file rather than as
XML entities
-un
--unicode-no-bom
as -u, but don't include a
byte order mark
-d
--drop-key don't put citation key in the
mods id field
-c
--corporation-file with argument specifying a
file containing a list of corporation names to be placed in
<name type="corporate"></name> instead of
type="personal" and eliminate name mangling
--verbose
verbose output
--debug very verbose output (mostly for
debugging)
bib2xml
bib2xml converts a
bibtex-formatted reference file to an XML-intermediate
bibliography file. Specify file(s) to be converted on the
command line. Files containing bibtex substitutions strings
should be specified before the files where substitutions are
specified (or in the same file before their use). If no files
are specified, then bibtex information will be read from
standard input.
bib2xml bibtex_file.bib > output_file.xml
copac2xml
copac2xmlconverts a COPAC
formatted reference file to a MODS XML-intermediate
bibliography file.
end2xml
end2xml
converts a text endnote-formatted reference file to an
XML-intermediate bibliography file. This program will not
work on the binary library; the file needs to be exported
first. Endnote tagged formats ("Refer" format export) look like
.
There are very nice instructions for making sure that
you are properly exporting this at
http://www.sonnysoftware.com/endnoteimport.html
Usage for
end2xml is the same as
bib2xml.
end2xml endnote_file.end > output_file.xml
endx2xml
endx2xml converts a
EndNote-XML exported reference file to a MODS
XML-intermediate bibliography file. This program will not
work on the binary library; the file needs to be exported
first.
isi2xml
isi2xml converts an
ISI-web-of-science-formatted reference file to an
XML-intermediate bibliography file.
Usage for
isi2xml is the same as
bib2xml.
isi2xml input_file.isi > output_file.xml
ris2xml
ris2xml converts a
RIS-formatted reference file to an XML-intermediate
bibliography file.
ris2xml usage is as
end2xml and
bib2xml
ris2xml ris_file.ris > output_file.xml
Converting from MODS
Overview
Command
Description
convert MODS into the SAO/NASA ADS format
convert MODS into bibtex
convert MODS into format for EndNote
xml2isi
convert MODS to ISI format
convert MODS into RIS format
convert MODS into Word 2007 bibliography format
Common Options Converting from MODS
Note that --output-encoding refers to the input file
-h
--help
display help
-v--version display version
-o--output-encodinginterpret the input file as
using the requested character set (use w/o argument for
current list derived from character sets at
www.kostis.net) unicode is now a character set
option
-s--single-refperfile
put one reference per file
name by the reference number
xml2bib
xml2bib converts the MODS XML
bibliography into a bibtex-formatted reference file.
xml2bib usage is as for other
tools
xml2bib xml_file.xml > output_file.bib
Since the BibTeX reference format is fairly flexible and
seems to have the greatest number of personal preferences, it
has also accumulated a number of specific options that are not
available for other formats.
Starting with 3.24, xml2bib output uses lowercase tags
and mixed case reference types for better interaction with other
software.
The older behavior with all uppercase tags/reference types
can still be generated using the command-line switch
-U/--uppercase.
xml2bib-specific Options:
-fc
--finalcomma
add final comma in the bibtex
output for those that want it
-sd
--singledash
use one dash instead of two
(longer dash in latex) between numbers in page
output
-b
--brackets
use brackets instead of quotation
marks around field data
-w
--whitespace
add beautifying whitespace to output
-U
--uppercase
use all uppercase for tags (field
names) and reference types (pre-3.24 behavior)
-sk
--strictkey
only use alphanumeric characters for bibtex citation keys
xml2ads
xml2ads converts the MODS XML
bibliography to the Smithsonian Astrophysical Observatory (SAO)/National
Aeronautics and Space Administration (NASA) Astrophyics Data System or
ADS reference format
(which is very similar to the tagged Endnote style).
xml2ads usage is as for other
tools
xml2ads xml_file.xml > output_file.ads
xml2ris
xml2ris converts the MODS XML
bibliography to RIS-formatted bibliography file.
xml2ris usage is as for other
tools
xml2ris xml_file.xml > output_file.ris
xml2end
xml2end converts the MODS XML
bibliography to tagged Endnote (refer-format) bibliography
file.
xml2end usage is as for other
tools
xml2end xml_file.xml > output_file.end
xml2wordbib
xml2wordbib converts the MODS XML
bibliography to Word 2007-formatted XML bibliography file.
xml2wordbib usage is as for other
tools
xml2wordbib xml_file.xml > output_file.word.xml
xml2wordbib was called xml2word in
versions of bibutils prior to 3.40. It was renamed to avoid confusion
with other tools. Hopefully this will not break too many scripts
already in use.
Examples
Example refer format file
%0 Journal Article
%A C. D. Putnam
%A C. S. Pikaard
%D 1992
%T Cooperative binding of the Xenopus RNA polymerase I
transcription factor xUBF to repetitive ribosomal gene enhancers
%J Mol Cell Biol
%V 12
%P 4970-4980
%F Putnam1992
xml2bib Output Variations
Default
@Article{Putnam1992,
author="C. D. Putnam
and C. S. Pikaard",
year="1992",
month="Nov",
title="Cooperative binding of the
Xenopus RNA polymerase I transcription
factor xUBF to repetitive ribosomal
gene enhancers",
journal="Mol Cell Biol",
volume="12",
pages="4970--4980",
number="11"}
Final Comma
@Article{Putnam1992,
author="C. D. Putnam
and C. S. Pikaard",
year="1992",
month="Nov",
title="Cooperative binding of the
Xenopus RNA polymerase I transcription
factor xUBF to repetitive ribosomal
gene enhancers",
journal="Mol Cell Biol",
volume="12",
pages="4970--4980",
number="11",}
Single Dash
@Article{Putnam1992,
author="C. D. Putnam
and C. S. Pikaard",
year="1992",
month="Nov",
title="Cooperative binding of the
Xenopus RNA polymerase I transcription
factor xUBF to repetitive ribosomal
gene enhancers",
journal="Mol Cell Biol",
volume="12",
pages="4970-4980",
number="11"}
Whitespace
@Article{Putnam1992,
author = "C. D. Putnam
and C. S. Pikaard",
year = "1992",
month = "Jan",
title = "Cooperative binding of
the Xenopus RNA polymerase I transcription
factor xUBF to repetitive ribosomal gene
enhancers",
journal = "Mol Cell Biol",
volume = "12",
pages = "4970--4980"
}
Brackets
@Article{Putnam1992,
author={Putnam, C. D.
and Pikaard, C. S.},
title={Cooperative binding of the Xenopus
RNA polymerase I transcription factor xUBF
to repetitive ribosomal gene enhancers},
journal={Mol Cell Biol},
year={1992},
month={Nov},
volume={12},
number={11},
pages={4970--4980}
}
Uppercase
@ARTICLE{Putnam1992,
AUTHOR="Putnam, C. D.
and Pikaard, C. S.",
TITLE="Cooperative binding of the Xenopus
RNA polymerase I transcription factor xUBF
to repetitive ribosomal gene enhancers",
JOURNAL="Mol Cell Biol",
YEAR="1992",
MONTH="Nov",
VOLUME="12",
NUMBER="11",
PAGES="4970--4980"
}
License
All versions of bibutils are relased under the GNU Public
License (GPL). In a nutshell, feel free to download, run, and
modify these programs as required. If you re-release these, you
need to release the modified version of the source. (And I'd
appreciate patches as well...if you care enough to make the
change, then I'd like to see what you're adding or
fixing.)
Chris Putnam, Ludwig Institute for Cancer Research
bibutils_4.12/maketgz.csh 0000644 0000764 0000764 00000001153 11444072056 015723 0 ustar cdputnam cdputnam #!/bin/csh -f
#
# $1 = version number
# $2 = postfix
#
#
set programs = ( biblatex2xml bib2xml copac2xml ebi2xml end2xml endx2xml ebi2xml isi2xml med2xml wordbib2xml modsclean ris2xml xml2ads xml2bib xml2end xml2isi xml2ris xml2wordbib )
set VERSION = $1
set POSTFIX = $2
if ( ! (-e update) ) mkdir update
if ( -e update/bibutils_${VERSION} ) /bin/rm -r update/bibutils_${VERSION}
mkdir update/bibutils_${VERSION}
foreach p ( $programs )
cp bin/${p} update/bibutils_${VERSION}/${p}
end
cd update
tar cvf - bibutils_${VERSION} | gzip - > bibutils_${VERSION}${POSTFIX}.tgz
cd ..
rm -r update/bibutils_${VERSION}
bibutils_4.12/configure 0000755 0000764 0000764 00000013632 11454375030 015475 0 ustar cdputnam cdputnam #!/bin/csh -f
set INPUT_FILE = Makefile_start
set OUTPUT_FILE = Makefile
set LIBTYPE = static
set INSTALLDIR = /usr/local/bin
set LIBINSTALLDIR = /usr/local/lib
while ( ${#argv} > 0 )
if ( $1 == "--install-dir" ) then
if ( ${#argv} < 2 ) then
echo "--install-dir requires a directory"
exit
else
shift
set INSTALLDIR = $1
shift
endif
else if ( $1 == "--install-lib" ) then
if ( ${#argv} < 2 ) then
echo "--install-lib requires a directory"
exit
else
shift
set LIBINSTALLDIR = $1
shift
endif
else if ( $1 == "--dynamic" ) then
set LIBTYPE = dynamic
shift
else if ( $1 == "--static" ) then
set LIBTYPE = static
shift
else
echo "Unidentified argument $1"
exit
endif
end
if ( "$LIBTYPE" == "dynamic" ) then
cp lib/Makefile.dynamic lib/Makefile
cp bin/Makefile.dynamic bin/Makefile
else
cp lib/Makefile.static lib/Makefile
cp bin/Makefile.static bin/Makefile
endif
set type = "Unknown"
set universal_binary = "FALSE"
set UNAME = `uname -a`
if ( ` echo $UNAME | grep Linux | wc | awk '{print $1;}' ` == 1 ) then
if ( ` echo $UNAME | grep 'i[3456]86' | wc | awk '{print $1};'` == 1 ) then
set type = "Linux_x86"
else if ( ` echo $UNAME | grep 'x86_64' | wc | awk '{print $1};'` == 1 ) then
set type = "Linux_x86_64"
else
set type = "Linux_Unknown"
endif
endif
if ( ` echo $UNAME | grep Darwin | wc | awk '{print $1;}' ` == 1 ) then
set type = "MacOSX_Unknown"
if ( ` echo $UNAME | grep -E 'powerpc|Power Macintosh' | wc | awk '{print $1};'` == 1 ) then
set type = "MacOSX_ppc"
endif
if ( ` echo $UNAME | grep 'i386' | wc | awk '{print $1}'` == 1 ) then
set type = "MacOSX_intel"
endif
endif
if ( ` echo $UNAME | grep SunOS | wc | awk '{print $1;}' ` == 1 ) then
set type = "SunOS5"
endif
if ( ` echo $UNAME | grep IRIX | wc | awk '{print $1;}' ` == 1 ) then
set type = "IRIX"
endif
if ( ` echo $UNAME | grep NetBSD | wc | awk '{print $1;}' ` == 1 ) then
set type = "NetBSD"
endif
if ( ` echo $UNAME | grep FreeBSD | wc | awk '{print $1;}' ` == 1 ) then
set type = "FreeBSD"
endif
if ( ` echo $UNAME | grep Cygwin | wc | awk '{print $1;}' ` == 1 ) then
set type = "Cygwin"
endif
#
# Support universal binaries for MacOSX's (gcc version 4 and higher)
#
# restrict to intel Mac's only because ppc Mac's I have access to
# just don't have the i386 libraries...
#
#if ( $type == "MacOSX_ppc" || $type == "MacOSX_intel" ) then
if ( $type == "MacOSX_intel" ) then
gcc -v >& tmp.$$
set gcc_version = ` grep version tmp.$$ | awk '{print $3;}' `
set gcc_major = ` echo $gcc_version | awk -v FS="." '{print $1;}' `
if ( $gcc_major > 3 ) then
set universal_binary = "TRUE"
endif
/bin/rm -f tmp.$$
endif
if ( $type == "Linux_x86" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib"'
set POSTFIX = '"_i386"'
else if ( $type == "Linux_x86_64" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib"'
set POSTFIX = '"_amd64"'
else if ( $type == "Linux_Unknown" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib"'
set POSTFIX = '""'
else if ( $type == "MacOSX_ppc" && $universal_binary == "TRUE" ) then
set CC = '"cc -arch i386 -arch ppc -Wall"'
set RANLIB = '"ranlib -s"'
set POSTFIX = '"_osx_universal"'
else if ( $type == "MacOSX_intel" && $universal_binary == "TRUE" ) then
set CC = '"cc -arch i386 -arch ppc -Wall"'
set RANLIB = '"ranlib -s"'
set POSTFIX = '"_osx_universal"'
else if ( $type == "MacOSX_ppc" || $type == "MacOSX_intel" || \
$type == "MacOSX_Unknown" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib -s"'
set POSTFIX = '"_osx"'
else if ( $type == "SunOS5" ) then
set CC = '"gcc -Wall"'
set RANLIB = '"echo Skipping ranlib"'
set POSTFIX = '"_sunos5"'
else if ( $type == "IRIX" ) then
set CC = '"gcc -Wall"'
set RANLIB = '"echo Skipping ranlib"'
echo POSTFIX = '"_irix"'
else if ( $type == "NetBSD" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib"'
set POSTFIX = '"_netbsd"'
else if ( $type == "FreeBSD" ) then
set CC = '"cc -Wall"'
set RANLIB = '"ranlib"'
set POSTFIX = '"_freebsd"'
else if ( $type == "Cygwin" ) then
set CC = '"cc -Wall"'
set RANLIB = '"echo Skipping ranlib"'
set POSTFIX = '"_cygwin"'
else
# Unknown operating system
set CC = '"cc"'
set RANLIB = '"echo Skipping ranlib"'
set POSTFIX = '""'
endif
cat $INPUT_FILE | \
sed "s/REPLACE_CC/CC=${CC}/" | \
sed "s/REPLACE_RANLIB/RANLIB=${RANLIB}/" | \
sed "s|REPLACE_INSTALLDIR|${INSTALLDIR}|" | \
sed "s|REPLACE_LIBINSTALLDIR|${LIBINSTALLDIR}|" | \
sed "s/REPLACE_POSTFIX/${POSTFIX}/" > $OUTPUT_FILE
echo
echo
echo "Bibutils Configuration"
echo "----------------------"
echo
echo "Operating system: $type"
echo "Library and binary type: $LIBTYPE"
echo "Binary installation directory: $INSTALLDIR"
echo "Library installation directory: $LIBINSTALLDIR"
echo
echo " - If auto-identification of operating system failed, e-mail cdputnam@ucsd.edu"
echo " with the output of the command: uname -a"
echo
echo " - Use --static or --dynamic to specify library and binary type;"
echo " the --static option is the default"
echo
echo " - Set binary installation directory with: --install-dir DIR"
echo
echo " - Set library installation directory with: --install-lib DIR"
echo
echo
if ( $OUTPUT_FILE == "Makefile" ) then
echo "To compile, type: make"
echo "To install, type: make install"
echo "To make tgz package, type: make package"
echo "To make deb package, type: make deb"
echo
echo "To clean up temporary files, type: make clean"
echo "To clean up all files, type: make realclean"
else
echo "To compile, type: make -f $OUTPUT_FILE"
echo "To install, type: make -f $OUTPUT_FILE install"
echo "To make tgz package, type: make -f $OUTPUT_FILE package"
echo "To make deb package, type: make -f $OUTPUT_FILE deb"
echo
echo "To clean up temporary files, type: make -f $OUTPUT_FILE clean"
echo "To clean up all files, type: make -f $OUTPUT_FILE realclean"
endif
echo
echo
bibutils_4.12/lib/ 0000755 0000764 0000764 00000000000 11477220252 014327 5 ustar cdputnam cdputnam bibutils_4.12/lib/is_ws.h 0000644 0000764 0000764 00000000365 11444072056 015631 0 ustar cdputnam cdputnam /*
* is_ws.h
*
* Copyright (c) Chris Putnam 2003-2009
*
* Source code released under the GPL
*
*/
#ifndef IS_WS_H
#define IS_WS_H
extern int is_ws( char ch );
extern char *skip_ws( char *p );
extern char *skip_notws( char *p );
#endif
bibutils_4.12/lib/wordout.c 0000644 0000764 0000764 00000037261 11477220220 016202 0 ustar cdputnam cdputnam /*
* wordout.c
*
* (Word 2007 format)
*
* Copyright (c) Chris Putnam 2007-2010
*
* Source code released under the GPL
*
*/
#include
#include
#include
#include "newstr.h"
#include "fields.h"
#include "utf8.h"
#include "wordout.h"
void
wordout_initparams( param *p, const char *progname )
{
p->writeformat = BIBL_WORD2007OUT;
p->format_opts = 0;
p->charsetout = BIBL_CHARSET_UNICODE;
p->charsetout_src = BIBL_SRC_DEFAULT;
p->latexout = 0;
p->utf8out = 0;
p->utf8bom = 0;
if ( !p->utf8out ) p->xmlout = 3;
else p->xmlout = 1;
p->nosplittitle = 0;
p->verbose = 0;
p->addcount = 0;
p->singlerefperfile = 0;
p->headerf = wordout_writeheader;
p->footerf = wordout_writefooter;
p->writef = wordout_write;
}
typedef struct convert {
char oldtag[25];
char newtag[25];
int code;
} convert;
typedef struct outtype {
int value;
char *out;
} outtype;
/*
At the moment 17 unique types of sources are defined:
{code}
Art
ArticleInAPeriodical
Book
BookSection
Case
Conference
DocumentFromInternetSite
ElectronicSource
Film
InternetSite
Interview
JournalArticle
Report
Misc
Patent
Performance
Proceedings
SoundRecording
{code}
*/
enum {
TYPE_UNKNOWN = 0,
TYPE_ART,
TYPE_ARTICLEINAPERIODICAL,
TYPE_BOOK,
TYPE_BOOKSECTION,
TYPE_CASE,
TYPE_CONFERENCE,
TYPE_DOCUMENTFROMINTERNETSITE,
TYPE_ELECTRONICSOURCE,
TYPE_FILM,
TYPE_INTERNETSITE,
TYPE_INTERVIEW,
TYPE_JOURNALARTICLE,
TYPE_MISC,
TYPE_PATENT,
TYPE_PERFORMANCE,
TYPE_PROCEEDINGS,
TYPE_REPORT,
TYPE_SOUNDRECORDING,
TYPE_THESIS,
TYPE_MASTERSTHESIS,
TYPE_PHDTHESIS,
};
/*
* fixed output
*/
static void
output_fixed( FILE *outptr, char *tag, char *data, int level )
{
int i;
for ( i=0; i%s%s>\n", tag, data, tag );
}
/* detail output
*
*/
static void
output_item( fields *info, FILE *outptr, char *tag, int item, int level )
{
int i;
if ( item==-1 ) return;
for ( i=0; i%s%s>\n", tag, info->data[item].data, tag );
fields_setused( info, item );
}
/* range output
*
* start-end
*
*/
static void
output_range( fields *info, FILE *outptr, char *tag, int start, int end,
int level )
{
int i;
if ( start==-1 && end==-1 ) return;
if ( start==-1 )
output_item( info, outptr, tag, end, 0 );
else if ( end==-1 )
output_item( info, outptr, tag, start, 0 );
else {
for ( i=0; i%s-%s%s>\n", tag,
info->data[start].data, info->data[end].data, tag );
fields_setused( info, start );
fields_setused( info, end );
}
}
static void
output_list( fields *info, FILE *outptr, convert *c, int nc )
{
int i, n;
for ( i=0; infields; ++i ) {
if ( strcasecmp( info->tag[i].data, "GENRE" ) &&
strcasecmp( info->tag[i].data, "NGENRE" ) ) continue;
genre = info->data[i].data;
for ( j=0; jlevel[i];
if ( !strcasecmp( genre, "academic journal" ) ) {
type = TYPE_JOURNALARTICLE;
}
else if ( !strcasecmp( genre, "periodical" ) ) {
if ( type == TYPE_UNKNOWN )
type = TYPE_ARTICLEINAPERIODICAL;
}
else if ( !strcasecmp( genre, "book" ) ||
!strcasecmp( genre, "collection" ) ) {
if ( info->level[i]==0 ) type = TYPE_BOOK;
else type = TYPE_BOOKSECTION;
}
else if ( !strcasecmp( genre, "conference publication" ) ) {
if ( level==0 ) type=TYPE_CONFERENCE;
type = TYPE_PROCEEDINGS;
}
else if ( !strcasecmp( genre, "thesis" ) ) {
if ( type==TYPE_UNKNOWN ) type=TYPE_THESIS;
}
else if ( !strcasecmp( genre, "Ph.D. thesis" ) ) {
type = TYPE_PHDTHESIS;
}
else if ( !strcasecmp( genre, "Masters thesis" ) ) {
type = TYPE_MASTERSTHESIS;
}
}
}
return type;
}
static int
get_type_from_resource( fields *info )
{
int type = TYPE_UNKNOWN, i;
char *resource;
for ( i=0; infields; ++i ) {
if ( strcasecmp( info->tag[i].data, "GENRE" )!=0 &&
strcasecmp( info->tag[i].data, "NGENRE" )!=0 ) continue;
resource = info->data[i].data;
if ( !strcasecmp( resource, "moving image" ) )
type = TYPE_FILM;
}
return type;
}
static int
get_type( fields *info )
{
int type;
type = get_type_from_genre( info );
if ( type==TYPE_UNKNOWN )
type = get_type_from_resource( info );
return type;
}
static void
output_titleinfo( fields *info, FILE *outptr, char *tag, int level )
{
char *p;
int ttl, subttl;
ttl = fields_find( info, "TITLE", level );
subttl = fields_find( info, "SUBTITLE", level );
if ( ttl!=-1 || subttl!=-1 ) {
fprintf( outptr, "<%s>", tag );
if ( ttl!=-1 ) {
fprintf( outptr, "%s", info->data[ttl].data );
fields_setused( info, ttl );
}
if ( subttl!=-1 ) {
if ( ttl!=-1 ) {
p = info->data[ttl].data;
if ( p[info->data[ttl].len-1]!='?' )
fprintf( outptr, ":" );
fprintf( outptr, " " );
}
fprintf( outptr, "%s", info->data[subttl].data );
fields_setused( info, subttl );
}
fprintf( outptr, "%s>\n", tag );
}
}
static void
output_title( fields *info, FILE *outptr, int level )
{
int ttl = fields_find( info, "TITLE", level );
int subttl = fields_find( info, "SUBTITLE", level );
int shrttl = fields_find( info, "SHORTTITLE", level );
output_titleinfo( info, outptr, "b:Title", 0 );
/* output shorttitle if it's different from normal title */
if ( shrttl!=-1 ) {
if ( ttl==-1 || subttl!=-1 ||
strcmp(info->data[ttl].data,info->data[shrttl].data) ) {
fprintf( outptr, " " );
fprintf( outptr, "%s", info->data[shrttl].data );
fprintf( outptr, "\n" );
}
fields_setused( info, shrttl );
}
}
static void
output_name_nomangle( FILE *outptr, char *p )
{
fprintf( outptr, "" );
fprintf( outptr, "%s", p );
fprintf( outptr, "\n" );
}
static void
output_name( FILE *outptr, char *p )
{
newstr family, part;
int n=0, npart=0;
newstr_init( &family );
while ( *p && *p!='|' ) newstr_addchar( &family, *p++ );
if ( *p=='|' ) p++;
if ( family.len ) {
fprintf( outptr, "" );
fprintf( outptr, "%s",family.data );
n++;
}
newstr_free( &family );
newstr_init( &part );
while ( *p ) {
while ( *p && *p!='|' ) newstr_addchar( &part, *p++ );
if ( part.len ) {
if ( n==0 ) fprintf( outptr, "" );
if ( npart==0 )
fprintf( outptr, "%s",
part.data );
else fprintf( outptr, "%s",
part.data );
n++;
npart++;
}
if ( *p=='|' ) {
p++;
newstr_empty( &part );
}
}
if ( n ) fprintf( outptr, "\n" );
newstr_free( &part );
}
#define NAME (1)
#define NAME_ASIS (2)
#define NAME_CORP (4)
static int
extract_name_and_info( newstr *outtag, newstr *intag )
{
int code = NAME;
newstr_newstrcpy( outtag, intag );
if ( newstr_findreplace( outtag, ":ASIS", "" ) ) code = NAME_ASIS;
if ( newstr_findreplace( outtag, ":CORP", "" ) ) code = NAME_CORP;
return code;
}
static void
output_name_type( fields *info, FILE *outptr, int level,
char *map[], int nmap, char *tag )
{
newstr ntag;
int i, j, n=0, code;
newstr_init( &ntag );
for ( j=0; jnfields; ++i ) {
code = extract_name_and_info( &ntag, &(info->tag[i]) );
if ( strcasecmp( ntag.data, map[j] ) ) continue;
if ( n==0 )
fprintf( outptr, "<%s>\n", tag );
if ( code != NAME )
output_name_nomangle( outptr, info->data[i].data );
else
output_name( outptr, info->data[i].data );
fields_setused( info, i );
n++;
}
}
newstr_free( &ntag );
if ( n )
fprintf( outptr, "%s>\n", tag );
}
static void
output_names( fields *info, FILE *outptr, int level, int type )
{
char *authors[] = { "AUTHOR", "WRITER", "ASSIGNEE", "ARTIST",
"CARTOGRAPHER", "INVENTOR", "ORGANIZER", "DIRECTOR",
"PERFORMER", "REPORTER", "TRANSLATOR", "RECIPIENT",
"2ND_AUTHOR", "3RD_AUTHOR", "SUB_AUTHOR", "COMMITTEE",
"COURT", "LEGISLATIVEBODY" };
int nauthors = sizeof( authors ) / sizeof( authors[0] );
char *editors[] = { "EDITOR" };
int neditors = sizeof( editors ) / sizeof( editors[0] );
char author_default[] = "b:Author", inventor[] = "b:Inventor";
char *author_type = author_default;
if ( type == TYPE_PATENT ) author_type = inventor;
fprintf( outptr, "\n" );
output_name_type( info, outptr, level, authors, nauthors, author_type );
output_name_type( info, outptr, level, editors, neditors, "b:Editor" );
fprintf( outptr, "\n" );
}
static void
output_date( fields *info, FILE *outptr, int level )
{
convert parts[3] = {
{ "PARTYEAR", "b:Year", -1 },
{ "PARTMONTH", "b:Month", -1 },
{ "PARTDAY", "b:Day", -1 }
};
convert fulls[3] = {
{ "YEAR", "", -1 },
{ "MONTH", "", -1 },
{ "DAY", "", -1 }
};
int i, np, nf;
for ( i=0; i<3; ++i ) {
np = fields_find( info, parts[i].oldtag, level );
nf = fields_find( info, fulls[i].oldtag, level );
if ( np!=-1 )
output_item( info, outptr, parts[i].newtag, np, 0 );
else if ( nf!=-1 )
output_item( info, outptr, parts[i].newtag, nf, 0 );
}
}
static void
output_pages( fields *info, FILE *outptr, int level )
{
int start = fields_find( info, "PAGESTART", -1 );
int end = fields_find( info, "PAGEEND", -1 );
int ar = fields_find( info, "ARTICLENUMBER", -1 );
if ( start!=-1 || end!=-1 )
output_range( info, outptr, "b:Pages", start, end, level );
else if ( ar!=-1 )
output_range( info, outptr, "b:Pages", ar, -1, level );
}
static void
output_includedin( fields *info, FILE *outptr, int type )
{
if ( type==TYPE_JOURNALARTICLE ) {
output_titleinfo( info, outptr, "b:JournalName", 1 );
} else if ( type==TYPE_ARTICLEINAPERIODICAL ) {
output_titleinfo( info, outptr, "b:PeriodicalName", 1 );
} else if ( type==TYPE_BOOKSECTION ) {
output_titleinfo( info, outptr, "b:ConferenceName", 1 ); /*??*/
} else if ( type==TYPE_PROCEEDINGS ) {
output_titleinfo( info, outptr, "b:ConferenceName", 1 );
}
}
static int
type_is_thesis( int type )
{
if ( type==TYPE_THESIS || type==TYPE_PHDTHESIS ||
type==TYPE_MASTERSTHESIS )
return 1;
else return 0;
}
static void
output_thesisdetails( fields *info, FILE *outptr, int type )
{
int i;
if ( type==TYPE_PHDTHESIS )
output_fixed( outptr, "b:ThesisType", "Ph.D. Thesis", 0 );
else if ( type==TYPE_MASTERSTHESIS )
output_fixed( outptr, "b:ThesisType", "Masters Thesis", 0 );
for ( i=0; infields; ++i ) {
if ( strcasecmp( info->tag[i].data, "DEGREEGRANTOR" ) &&
strcasecmp( info->tag[i].data, "DEGREEGRANTOR:ASIS") &&
strcasecmp( info->tag[i].data, "DEGREEGRANTOR:CORP"))
continue;
output_item( info, outptr, "b:Institution", i, 0 );
}
}
static
outtype types[] = {
{ TYPE_UNKNOWN, "Misc" },
{ TYPE_MISC, "Misc" },
{ TYPE_BOOK, "Book" },
{ TYPE_BOOKSECTION, "BookSection" },
{ TYPE_CASE, "Case" },
{ TYPE_CONFERENCE, "Conference" },
{ TYPE_ELECTRONICSOURCE, "ElectronicSource" },
{ TYPE_FILM, "Film" },
{ TYPE_INTERNETSITE, "InternetSite" },
{ TYPE_INTERVIEW, "Interview" },
{ TYPE_SOUNDRECORDING, "SoundRecording" },
{ TYPE_ARTICLEINAPERIODICAL, "ArticleInAPeriodical" },
{ TYPE_DOCUMENTFROMINTERNETSITE, "DocumentFromInternetSite" },
{ TYPE_JOURNALARTICLE, "JournalArticle" },
{ TYPE_REPORT, "Report" },
{ TYPE_PATENT, "Patent" },
{ TYPE_PERFORMANCE, "Performance" },
{ TYPE_PROCEEDINGS, "Proceedings" },
};
static
int ntypes = sizeof( types ) / sizeof( types[0] );
static void
output_type( fields *info, FILE *outptr, int type )
{
int i, found = 0;
fprintf( outptr, "" );
for ( i=0; i\n" );
if ( type_is_thesis( type ) )
output_thesisdetails( info, outptr, type );
}
static void
output_comments( fields *info, FILE *outptr, int level )
{
int i, written=0;
int nabs = fields_find( info, "ABSTRACT", level );
if ( nabs!=-1 ) {
fprintf( outptr, "" );
fprintf( outptr, "%s", info->data[nabs].data );
written = 1;
}
for ( i=0; infields; ++i ) {
if ( info->level[i]!=level ) continue;
if ( strcasecmp( info->tag[i].data, "NOTES" ) ) continue;
if ( !written ) {
fprintf( outptr, "" );
written = 1;
}
fprintf( outptr, "%s", info->data[i].data );
}
if ( written ) fprintf( outptr, "\n" );
}
static void
output_bibkey( fields *info, FILE *outptr )
{
int n = fields_find( info, "REFNUM", -1 );
if ( n==-1 ) n = fields_find( info, "BIBKEY", -1 );
output_item( info, outptr, "b:Tag", n, 0 );
}
static void
output_citeparts( fields *info, FILE *outptr, int level, int max, int type )
{
convert origin[] = {
{ "ADDRESS", "b:City", -1 },
{ "PUBLISHER", "b:Publisher", -1 },
{ "EDITION", "b:Edition", -1 }
};
int norigin = sizeof( origin ) / sizeof ( convert );
convert parts[] = {
{ "VOLUME", "b:Volume", -1 },
{ "SECTION", "b:Section", -1 },
{ "ISSUE", "b:Issue", -1 },
{ "NUMBER", "b:Issue", -1 },
{ "PUBLICLAWNUMBER", "b:Volume", -1 },
{ "SESSION", "b:Issue", -1 },
};
int nparts=sizeof(parts)/sizeof(convert);
output_bibkey( info, outptr );
output_type( info, outptr, type );
output_list( info, outptr, origin, norigin );
output_date( info, outptr, level );
output_includedin( info, outptr, type );
output_list( info, outptr, parts, nparts );
output_pages( info, outptr, level );
output_names( info, outptr, level, type );
output_title( info, outptr, level );
output_comments( info, outptr, level );
}
void
wordout_write( fields *info, FILE *outptr, param *p, unsigned long numrefs )
{
int max = fields_maxlevel( info );
int type = get_type( info );
fprintf( outptr, "\n" );
output_citeparts( info, outptr, -1, max, type );
fprintf( outptr, "\n" );
fflush( outptr );
}
void
wordout_writeheader( FILE *outptr, param *p )
{
if ( p->utf8bom ) utf8_writebom( outptr );
fprintf(outptr,"\n");
fprintf(outptr,"\n");
}
void
wordout_writefooter( FILE *outptr )
{
fprintf(outptr,"\n");
fflush( outptr );
}
bibutils_4.12/lib/fields.h 0000644 0000764 0000764 00000002075 11444072056 015753 0 ustar cdputnam cdputnam /*
* fields.h
*
* Copyright (c) Chris Putnam 2003-2009
*
* Source code released under the GPL
*
*/
#ifndef FIELDS_H
#define FIELDS_H
#define LEVEL_MAIN (0)
#define LEVEL_HOST (1)
#define LEVEL_SERIES (2)
#define LEVEL_ORIG (-2)
#include "newstr.h"
typedef struct {
newstr *tag;
newstr *data;
int *used;
int *level;
int nfields;
int maxfields;
} fields;
extern int fields_add( fields *info, char *tag, char *data, int level );
extern int fields_add_tagsuffix( fields *info, char *tag, char *suffix,
char *data, int level );
extern void fields_free( fields *info );
extern void fields_init( fields *info );
extern fields *fields_new( void );
extern int fields_find( fields *info, char *searchtag, int level );
extern int fields_find_firstof( fields *info, char *tags[], int ntags,
int level );
extern int fields_maxlevel( fields *info );
extern void fields_clearused( fields *info );
extern void fields_setused( fields *info, int n );
extern void fields_replace_or_add( fields *info, char *tag, char *data, int level );
#endif
bibutils_4.12/lib/entities.h 0000644 0000764 0000764 00000000365 11444072056 016331 0 ustar cdputnam cdputnam /*
* entities.h
*
* Copyright (c) Chris Putnam 2003-2009
*
* Source code released under the GPL
*/
#ifndef ENTITIES_H
#define ENTITIES_H
extern unsigned int decode_entity( char *s, unsigned int *pi,
int *unicode, int *err );
#endif
bibutils_4.12/lib/Makefile.dynamic 0000644 0000764 0000764 00000002551 11454375517 017427 0 ustar cdputnam cdputnam #CC = gcc -Wall
SIMPLE_OBJS = is_ws.o strsearch.o charsets.o
NEWSTR_OBJS = newstr.o newstr_conv.o entities.o latex.o utf8.o gb18030.o
CONTAIN_OBJS = fields.o list.o xml.o xml_encoding.o
BIBL_OBJS = name.o title.o doi.o bibl.o serialno.o marc.o reftypes.o bibl.o
INPUT_OBJS = bibtexin.o bibtextypes.o \
biblatexin.o bltypes.o \
copacin.o copactypes.o \
endin.o endtypes.o \
endxmlin.o \
isiin.o isitypes.o \
medin.o \
modsin.o modstypes.o marc.o \
risin.o ristypes.o \
ebiin.o wordin.o
OUTPUT_OBJS = bibtexout.o endout.o isiout.o modsout.o risout.o wordout.o \
adsout.o
BIBCORE_OBJS = $(SIMPLE_OBJS) $(NEWSTR_OBJS) $(CONTAIN_OBJS) $(BIBL_OBJS) \
bibcore.o
BIBUTILS_OBJS = $(INPUT_OBJS) $(OUTPUT_OBJS) bibutils.o
all: libbibutils.so
.c.o:
$(CC) -fPIC -c $(CFLAGS) -o $@ $<
libbibutils.so: $(BIBCORE_OBJS) $(BIBUTILS_OBJS)
$(CC) -shared -Wl,-soname,libbibutils.so.4 -o libbibutils.so.4.12 $^
ln -s libbibutils.so.4.12 libbibutils.so.4
ln -s libbibutils.so.4.12 libbibutils.so
install:
echo INSTALLING LIBRARIES TO $(LIBINSTALLDIR)
cp libbibutils.so.4.12 $(LIBINSTALLDIR)
ln -sf $(LIBINSTALLDIR)/libbibutils.so.4.12 $(LIBINSTALLDIR)/libbibutils.so
ln -sf $(LIBINSTALLDIR)/libbibutils.so.4.12 $(LIBINSTALLDIR)/libbibutils.so.4
clean:
/bin/rm -f *.o core
realclean:
/bin/rm -f *.o *.so *.so.4 *.so.4.12 core
test:
bibutils_4.12/lib/marc.h 0000644 0000764 0000764 00000000357 11444072056 015430 0 ustar cdputnam cdputnam /*
* marc.h
*
* Copyright (c) Chris Putnam 2008-9
*
* Program and source code released under the GPL
*
*/
#ifndef MARC_H
#define MARC_H
extern int marc_findgenre( char *query );
extern int marc_findresource( char *query );
#endif
bibutils_4.12/lib/name.c 0000644 0000764 0000764 00000022372 11444072056 015422 0 ustar cdputnam cdputnam /*
* name.c
*
* mangle names w/ and w/o commas
*
* Copyright (c) Chris Putnam 2004-2010
*
* Source code released under the GPL
*
*/
#include
#include
#include
#include "is_ws.h"
#include "newstr.h"
#include "fields.h"
#include "list.h"
#include "name.h"
static void
check_case( char *start, char *end, int *upper, int *lower )
{
int u = 0, l = 0;
char *p = start;
while ( p < end ) {
if ( islower( *p ) ) l = 1;
else if ( isupper( *p ) ) u = 1;
p++;
}
*upper = u;
*lower = l;
}
static int
should_split( char *last_start, char *last_end, char *first_start,
char *first_end )
{
int upperlast, lowerlast, upperfirst, lowerfirst;
check_case( first_start, first_end, &upperfirst, &lowerfirst );
check_case( last_start, last_end, &upperlast, &lowerlast );
if ( ( upperlast && lowerlast ) && ( upperfirst && !lowerfirst ) )
return 1;
else return 0;
}
/* name_addmultibytechar
*
* Add character to newstring s starting at pointer p.
*
* Handles the case for multibyte Unicode chars (with high bits
* set). Do not progress past the lastp barrier.
*
* Since we can progress more than one byte in the string,
* return the properly updated pointer p.
*/
static char *
name_addmultibytechar( newstr *s, char *p, char *lastp )
{
if ( ! ((*p) & 128) ) {
newstr_addchar( s, *p );
p++;
} else {
while ( p!=lastp && ((*p) & 128) ) {
newstr_addchar( s, *p );
p++;
}
}
return p;
}
static void
name_givennames_nosplit( char *start_first, char *end_first, newstr *outname )
{
char *p;
p = start_first;
while ( p!=end_first ) {
if ( !is_ws( *p ) && *p!='.' ) {
p = name_addmultibytechar( outname, p, end_first );
} else {
if ( *p=='.' ) p++;
while ( p!=end_first && is_ws( *p ) ) p++;
if ( p!=end_first )
newstr_addchar( outname, '|' );
}
}
}
static void
name_givennames_split( char *start_first, char *end_first, newstr *outname )
{
int n = 0;
char *p;
p = start_first;
while ( p!=end_first ) {
if ( !is_ws( *p ) ) {
if ( *p=='.' && *(p+1)=='-' ) {
newstr_strcat( outname, ".-" );
p++; p++;
p = skip_ws( p );
p = name_addmultibytechar( outname, p, end_first );
newstr_addchar( outname, '.' );
n++;
} else if ( *p=='.' ) {
p++;
} else if ( *p=='-' ) {
newstr_strcat( outname, ".-" );
p++;
p = skip_ws( p );
p = name_addmultibytechar( outname, p, end_first );
newstr_addchar( outname, '.' );
n++;
} else {
if ( n ) newstr_addchar( outname, '|' );
p = name_addmultibytechar( outname, p, end_first );
n++;
}
} else {
while ( p!=end_first && is_ws( *p ) ) p++;
}
}
}
static void
name_givennames( char *first_start, char *first_end, char *last_start,
char *last_end, newstr *outname )
{
int splitfirst;
newstr_addchar( outname, '|' );
splitfirst = should_split( last_start, last_end, first_start,
first_end );
if ( !splitfirst )
name_givennames_nosplit( first_start, first_end, outname );
else
name_givennames_split( first_start, first_end, outname );
}
static char *
string_end( char *p )
{
while ( *p ) p++;
return p;
}
/* name_nocomma()
*
* names in the format "H. F. Author"
*/
void
name_nocomma( char *start, newstr *outname )
{
char *p, *last_start, *last_end, *first_start, *first_end;
/** Last name **/
p = last_end = string_end( start );
while ( p!=start && !is_ws( *p ) ) p--;
if ( !strcasecmp( p+1, "Jr." ) || !strcasecmp( p+1, "III" ) ) {
while ( p!=start && is_ws( *p ) ) p--;
while ( p!=start && !is_ws( *p ) ) p--;
}
last_start = p = skip_ws( p );
while ( p!=last_end )
newstr_addchar( outname, *p++ );
/** Given names **/
if ( start!=last_start ) {
first_start = skip_ws( start );
first_end = last_start;
while ( first_end!=first_start && !is_ws( *first_end ) )
first_end--;
name_givennames( first_start, last_start, last_start, last_end,
outname );
}
}
/*
* name_comma()
*
* names in the format "Author, H.F.", w/comma
*/
void
name_comma( char *p, newstr *outname )
{
char *start_first, *end_first, *start_last, *end_last;
/** Last name **/
start_last = end_last = skip_ws( p );
while ( *p && ( *p!=',' ) ) {
newstr_addchar( outname, *p++ );
end_last = p;
}
/** Given names **/
if ( *p==',' ) p++;
start_first = skip_ws( p );
if ( *start_first ) {
end_first = string_end( start_first );
name_givennames( start_first, end_first, start_last, end_last,
outname );
}
}
/* Determine if name is of type "corporate" or if it
* should be added "as-is"; both should not be mangled.
*
* First check tag for prefixes ":CORP" and ":ASIS",
* then optionally check lists, bailing if "corporate"
* type can be identified.
*
* "corporate" is the same as "as-is" plus getting
* special MODS treatment, so "corporate" type takes
* priority
*/
static void
name_determine_flags( int *ctf, int *clf, int *atf, int *alf, char *tag, char *data, list *asis, list *corps )
{
int corp_tag_flag = 0, corp_list_flag = 0;
int asis_tag_flag = 0, asis_list_flag = 0;
if ( strstr( tag, ":CORP" ) ) corp_tag_flag = 1;
else if ( list_find( corps, data ) != -1 )
corp_list_flag = 1;
if ( strstr( tag, ":ASIS" ) ) {
asis_tag_flag = 1;
if ( list_find( corps, data ) != -1 )
corp_list_flag = 1;
} else {
if ( list_find( corps, data ) != -1 )
corp_list_flag = 1;
else if ( list_find( asis, data ) != -1 )
asis_list_flag = 1;
}
*ctf = corp_tag_flag;
*clf = corp_list_flag;
*atf = asis_tag_flag;
*alf = asis_list_flag;
}
/*
* return 1 on a nomangle with a newtag value
* return 0 on a name to mangle
*/
static int
name_nomangle( char *tag, char *data, newstr *newtag, list *asis, list *corps )
{
int corp_tag_flag, corp_list_flag;
int asis_tag_flag, asis_list_flag;
name_determine_flags( &corp_tag_flag, &corp_list_flag,
&asis_tag_flag, &asis_list_flag, tag, data, asis, corps );
if ( corp_tag_flag || corp_list_flag || asis_tag_flag || asis_list_flag ) {
newstr_strcpy( newtag, tag );
if ( corp_tag_flag ) { /* do nothing else */
} else if ( corp_list_flag && !asis_tag_flag ) {
newstr_strcat( newtag, ":CORP" );
} else if ( corp_list_flag && asis_tag_flag ) {
newstr_findreplace( newtag, ":ASIS", ":CORP" );
} else if ( asis_tag_flag ) { /* do nothing else */
} else if ( asis_list_flag ) {
newstr_strcat( newtag, ":ASIS" );
}
return 1;
}
else return 0;
}
static void
name_person( fields *info, char *tag, int level, newstr *inname )
{
newstr outname;
newstr_init( &outname );
if ( strchr( inname->data, ',' ) )
name_comma( inname->data, &outname );
else
name_nocomma( inname->data, &outname );
if ( outname.len!=0 )
fields_add( info, tag, outname.data, level );
newstr_free( &outname );
}
/*
* name_process
*
* returns 1 if "et al." needs to be added to the list globally
*/
static int
name_process( fields *info, char *tag, int level, newstr *inname, list *asis, list *corps )
{
newstr newtag;
int add_etal = 0;
/* keep "names" like " , " from coredumping program */
if ( !inname->len ) return 0;
/* identify and process asis or corps names */
newstr_init( &newtag );
if ( name_nomangle( tag, inname->data, &newtag, asis, corps ) ) {
fields_add( info, newtag.data, inname->data, level );
} else {
if ( strstr( inname->data, "et al." ) ) {
add_etal = 1;
newstr_findreplace( inname, "et al.", "" );
}
if ( inname->len ) {
name_person( info, tag, level, inname );
}
}
newstr_free( &newtag );
return add_etal;
}
/*
* name_add( info, newtag, data, level )
*
* take name(s) in data, multiple names should be separated by
* '|' characters and divide into individual name, e.g.
* "H. F. Author|W. G. Author|Q. X. Author"
*
* for each name, compare to names in the "as is" or "corporation"
* lists...these are not personal names and should be added to the
* bibliography fields directly and should not be mangled
*
* for each personal name, send to appropriate algorithm depending
* on if the author name is in the format "H. F. Author" or
* "Author, H. F."
*/
void
name_add( fields *info, char *tag, char *q, int level, list *asis, list *corps )
{
newstr inname, newtag;
char *p, *start, *end;
int add_etal = 0;
if ( !q ) return;
newstr_init( &inname );
newstr_init( &newtag );
while ( *q ) {
start = q = skip_ws( q );
/* strip tailing whitespace and commas */
while ( *q && *q!='|' ) q++;
end = q;
while ( is_ws( *end ) || *end==',' || *end=='|' || *end=='\0' )
end--;
for ( p=start; p<=end; p++ )
newstr_addchar( &inname, *p );
add_etal += name_process( info, tag, level, &inname, asis, corps );
#if 0
/* keep "names" like " , " from coredumping program */
if ( inname.len ) {
if ( name_nomangle( tag, inname.data, &newtag, asis, corps ) ) {
fields_add( info, newtag.data, inname.data, level );
newstr_empty( &newtag );
} else {
if ( strstr( inname.data, "et al." ) ) {
add_etal=1;
newstr_findreplace( &inname, "et al.", "" );
}
if ( inname.len ) name_person( info, tag, level, &inname, asis, corps );
}
newstr_empty( &inname );
}
#endif
newstr_empty( &inname );
if ( *q=='|' ) q++;
}
if ( add_etal ) {
newstr_strcpy( &newtag, tag );
newstr_strcat( &newtag, ":ASIS" );
fields_add( info, newtag.data, "et al.", level );
}
newstr_free( &inname );
newstr_free( &newtag );
}
bibutils_4.12/lib/biblatexin.h 0000644 0000764 0000764 00000001634 11445517561 016634 0 ustar cdputnam cdputnam /*
* biblatexin.h
*
* Copyright (c) Chris Putnam 2008-2010
*
* Program and source code released under the GPL
*
*/
#ifndef BIBLATEXIN_H
#define BIBLATEXIN_H
#include "newstr.h"
#include "list.h"
#include "fields.h"
#include "bibl.h"
#include "bibutils.h"
#include "reftypes.h"
extern void biblatexin_convertf( fields *bibin, fields *info, int reftype, param *p, variants *all, int nall );
extern int biblatexin_processf( fields *bibin, char *data, char *filename, long nref );
extern void biblatexin_cleanf( bibl *bin, param *p );
extern int biblatexin_readf( FILE *fp, char *buf, int bufsize, int *bufpos, newstr *line, newstr *reference, int *fcharset );
extern int biblatexin_typef( fields *bibin, char *filename, int nrefs,
param *p, variants *all, int nall );
extern void biblatexin_initparams( param *p, const char *progname );
extern variants biblatex_all[];
extern int biblatex_nall;
#endif
bibutils_4.12/lib/bibl.c 0000644 0000764 0000764 00000003001 11444072056 015376 0 ustar cdputnam cdputnam /*
* bibl.c
*
* Copyright (c) Chris Putnam 2005-2010
*
* Source code released under the GPL
*
*/
#include
#include
#include "bibl.h"
void
bibl_init( bibl *b )
{
b->nrefs = b->maxrefs = 0L;
b->ref = NULL;
}
static void
bibl_malloc( bibl * b )
{
int alloc = 50;
b->nrefs = 0;
b->ref = ( fields ** ) malloc( sizeof( fields* ) * alloc );
if ( b->ref ) {
b->maxrefs = alloc;
} else {
fprintf( stderr, "bibl_malloc: allocation error\n" );
exit( EXIT_FAILURE );
}
}
static void
bibl_realloc( bibl * b )
{
int alloc = b->maxrefs * 2;
fields **more;
more = ( fields ** ) realloc( b->ref, sizeof( fields* ) * alloc );
if ( more ) {
b->ref = more;
b->maxrefs = alloc;
} else {
fprintf( stderr, "bibl_realloc: allocation error\n" );
exit( EXIT_FAILURE );
}
}
void
bibl_addref( bibl *b, fields *ref )
{
if ( b->maxrefs==0 ) bibl_malloc( b );
else if ( b->nrefs >= b->maxrefs ) bibl_realloc( b );
b->ref[ b->nrefs ] = ref;
b->nrefs++;
}
void
bibl_free( bibl *b )
{
long i;
for ( i=0; inrefs; ++i )
fields_free( b->ref[i] );
free( b->ref );
b->ref = NULL;
b->nrefs = b->maxrefs = 0;
}
void
bibl_copy( bibl *bout, bibl *bin )
{
fields *refin;
fields *refout;
int i, j;
for ( i=0; inrefs; ++i ) {
refin = bin->ref[i];
refout = fields_new();
for ( j=0; jnfields; ++j ) {
if ( refin->tag[j].data && refin->data[j].data )
fields_add( refout, refin->tag[j].data,
refin->data[j].data, refin->level[j] );
}
bibl_addref( bout, refout );
}
}
bibutils_4.12/lib/title.c 0000644 0000764 0000764 00000002275 11444072056 015623 0 ustar cdputnam cdputnam /*
* title.c
*
* process titles into title/subtitle pairs for MODS
*
* Copyright (c) Chris Putnam 2004-2010
*
* Source code released under the GPL
*
*/
#include
#include
#include
#include "newstr.h"
#include "fields.h"
#include "title.h"
#include "is_ws.h"
void
title_process( fields *info, char *tag, char *data, int level,
unsigned char nosplittitle )
{
newstr title, subtitle;
char *p, *q;
newstr_init( &title );
newstr_init( &subtitle );
if ( nosplittitle ) q = NULL;
else {
q = strstr( data, ": " );
if ( !q ) q = strstr( data, "? " );
}
if ( !q ) newstr_strcpy( &title, data );
else {
p = data;
while ( p!=q ) newstr_addchar( &title, *p++ );
if ( *q=='?' ) newstr_addchar( &title, '?' );
q++;
q = skip_ws( q );
while ( *q ) newstr_addchar( &subtitle, *q++ );
}
if ( strncasecmp( "SHORT", tag, 5 ) ) {
if ( title.len>0 )
fields_add( info, "TITLE", title.data, level );
if ( subtitle.len>0 )
fields_add( info, "SUBTITLE", subtitle.data, level );
} else {
if ( title.len>0 )
fields_add( info, "SHORTTITLE", title.data, level );
/* no SHORT-SUBTITLE! */
}
newstr_free( &subtitle );
newstr_free( &title );
}
bibutils_4.12/lib/entities.c 0000644 0000764 0000764 00000033603 11444072056 016325 0 ustar cdputnam cdputnam /*
* entities.c
*
* Copyright (c) Chris Putnam 2003-2010
*
* Source code released under the GPL
*/
#include
#include
#include
#include "entities.h"
/* HTML 4.0 entities */
typedef struct entities {
char html[20];
unsigned int unicode;
} entities;
entities html_entities[] = {
/* Special Entities */
{ """, 34 }, /* quotation mark */
{ "&", 38 }, /* ampersand */
{ "'", 39 }, /* apostrophe */
{ "(", 40 }, /* left parenthesis */
{ ")", 41 }, /* right parenthesis */
{ "‐", 45 }, /* hyphen */
{ "<", 60 }, /* less-than sign */
{ ">", 62 }, /* greater-than sign */
{ "?", 63 }, /* question mark */
{ "Œ", 338 }, /* Latin cap ligature OE */
{ "œ", 339 }, /* Latin small ligature OE */
{ "Š", 352 }, /* Latin cap S with caron */
{ "š", 353 }, /* Latin cap S with caron */
{ "Ÿ", 376 }, /* Latin cap y with diaeresis */
{ "ˆ", 710 }, /* modifier letter circumflex */
{ "˜", 732 }, /* small tilde */
{ " ", 8194 }, /* en space */
{ " ", 8195 }, /* em space */
{ " ", 8201 }, /* thin space */
{ "", 8204 }, /* zero width non-joiner */
{ "", 8205 }, /* zero width joiner */
{ "", 8206 }, /* left-to-right mark */
{ "", 8207 }, /* right-to-left mark */
{ "–", 8211 }, /* en dash */
{ "—", 8212 }, /* em dash */
{ "‘", 8216 }, /* left single quotation mark */
{ "’", 8217 }, /* right single quot. mark */
{ "‚", 8218 }, /* single low-9 quot. mark */
{ "“", 8220 }, /* left double quot. mark */
{ "”", 8221 }, /* right double quot. mark */
{ "„", 8222 }, /* double low-9 quot. mark */
{ "†", 8224 }, /* dagger */
{ "‡", 8225 }, /* double dagger */
{ "‰", 8240 }, /* per mille sign */
{ "‹", 8249 }, /* sin. left angle quot mark */
{ "›", 8250 }, /* sin. right angle quot mark */
{ "€", 8364 }, /* euro sign */
/* Symbols and Greek characters */
{ "ƒ", 402 }, /* small f with hook = function */
{ "Α", 913 }, /* capital alpha */
{ "Β", 914 }, /* capital beta */
{ "Γ", 915 }, /* capital gamma */
{ "Δ", 916 }, /* capital delta */
{ "Ε", 917 }, /* capital epsilon */
{ "Ζ", 918 }, /* capital zeta */
{ "Η", 919 }, /* capital eta */
{ "Θ", 920 }, /* capital theta */
{ "Ι", 921 }, /* capital iota */
{ "Κ", 922 }, /* capital kappa */
{ "Λ", 923 }, /* capital lambda */
{ "Μ", 924 }, /* capital mu */
{ "Ν", 925 }, /* capital nu */
{ "Ξ", 926 }, /* capital xi */
{ "Ο", 927 }, /* capital omicron */
{ "Π", 928 }, /* capital pi */
{ "Ρ", 929 }, /* capital rho */
{ "Σ", 931 }, /* capital sigma */
{ "Τ", 932 }, /* capital tau */
{ "Υ", 933 }, /* capital upsilon */
{ "Φ", 934 }, /* capital phi */
{ "Χ", 935 }, /* capital chi */
{ "Ψ", 936 }, /* capital psi */
{ "Ω", 937 }, /* capital omega */
{ "α", 945 }, /* small alpha */
{ "β", 946 }, /* small beta */
{ "γ", 947 }, /* small gamma */
{ "δ", 948 }, /* small delta */
{ "ε", 949 }, /* small epsilon */
{ "ζ", 950 }, /* small zeta */
{ "η", 951 }, /* small eta */
{ "θ", 952 }, /* small theta */
{ "ι", 953 }, /* small iota */
{ "κ", 954 }, /* small kappa */
{ "λ", 955 }, /* small lambda */
{ "μ", 956 }, /* small mu */
{ "ν", 957 }, /* small nu */
{ "ξ", 958 }, /* small xi */
{ "ο", 959 }, /* small omicron */
{ "π", 960 }, /* small pi */
{ "ρ", 961 }, /* small rho */
{ "ς", 962 }, /* small final sigma */
{ "σ", 963 }, /* small simga */
{ "τ", 964 }, /* small tau */
{ "υ", 965 }, /* small upsilon */
{ "φ", 966 }, /* small phi */
{ "χ", 967 }, /* small chi */
{ "ψ", 968 }, /* small psi */
{ "ω", 969 }, /* small omega */
{ "ϑ",977 }, /* small theta symbol */
{ "ϒ", 978 }, /* small upsilon with hook */
{ "ϖ", 982 }, /* pi symbol */
{ "•", 8226 }, /* bullet = small blk circle */
{ "…", 8230 }, /* horizontal ellipsis */
{ "′", 8242 }, /* prime = minutes = feet */
{ "″", 8243 }, /* double prime */
{ "‾", 8254 }, /* overline */
{ "⁄", 8260 }, /* fraction slash */
{ "℘", 8472 }, /* Weierstrass p = power set */
{ "ℑ", 8465 }, /* imaginary part-black cap I */
{ "ℜ", 8476 }, /* real part-black cap R */
{ "™", 8482 }, /* trademark sign */
{ "ℵ",8501 }, /* alef symbol */
{ "←", 8592 }, /* left arrow */
{ "↑", 8593 }, /* up arrow */
{ "→", 8594 }, /* right arrow */
{ "↓", 8595 }, /* down arrow */
{ "↔", 8596 }, /* left/right arrow */
{ "↵", 8629 }, /* down arrow with corner left */
{ "⇐", 8656 }, /* left double arrow */
{ "⇑", 8657 }, /* up double arrow */
{ "⇒", 8658 }, /* up double arrow */
{ "⇓", 8659 }, /* up double arrow */
{ "⇔", 8660 }, /* up double arrow */
{ "∀", 8704}, /* for all */
{ "∂", 8706}, /* partial differential */
{ "∃", 8707}, /* there exists */
{ "∅", 8709}, /* empty set */
{ "∇", 8711}, /* nabla=backwards difference */
{ "∈", 8712}, /* element of */
{ "∉", 8713}, /* not an element of */
{ "∋", 8715}, /* contains as member */
{ "∏", 8719}, /* n-ary product */
{ "∑", 8721}, /* n-ary summation */
{ "−", 8722}, /* minuss sign */
{ "∗", 8727}, /* asterisk operator */
{ "√", 8730}, /* square root */
{ "∝", 8733}, /* proportional to */
{ "∞", 8734}, /* infinity */
{ "∠", 8736}, /* angle */
{ "∧", 8743}, /* logical and */
{ "∨", 8744}, /* logical or */
{ "∩", 8745}, /* intersection */
{ "∪", 8746}, /* union */
{ "∫", 8747}, /* integral */
{ "∴", 8756}, /* therefore */
{ "∼", 8764}, /* tilde operator */
{ "≅", 8773}, /* approximately equal to */
{ "≈", 8776}, /* asymptotic to */
{ "≠", 8800}, /* not equal to */
{ "≡", 8801}, /* identical to */
{ "≤", 8804}, /* less-than or equal to */
{ "≥", 8805}, /* greater-than or equal to */
{ "⊂", 8834}, /* subset of */
{ "⊃", 8835}, /* superset of */
{ "⊄", 8836}, /* not a subset of */
{ "⊆", 8838}, /* subset of or equal to */
{ "⊇", 8839}, /* superset of or equal to */
{ "⊕", 8853}, /* circled plus = direct sum */
{ "⊗", 8855}, /* circled times = vec prod */
{ "⊥", 8869}, /* perpendicular */
{ "⋅", 8901}, /* dot operator */
{ "⌈", 8968}, /* left ceiling */
{ "⌉", 8969}, /* right ceiling */
{ "⌊", 8970}, /* left floor */
{ "⌋", 8971}, /* right floor */
{ "〈", 9001}, /* left angle bracket */
{ "〉", 9002}, /* right angle bracket */
{ "◊", 9674}, /* lozenge */
{ "♠", 9824}, /* spades */
{ "♣", 9827}, /* clubs */
{ "♥", 9829}, /* hearts */
{ "♦", 9830}, /* diamonds */
/* Latin-1 */
{ " ", 32 }, /* non-breaking space */
{ "¡", 161 }, /* inverted exclamation mark */
{ "¢", 162 }, /* cent sign */
{ "£", 163 }, /* pound sign */
{ "¤", 164 }, /* currency sign */
{ "¥", 165 }, /* yen sign */
{ "¦", 166 }, /* broken vertical bar */
{ "§", 167 }, /* section sign */
{ "¨", 168 }, /* diaeresis - spacing diaeresis */
{ "©", 169 }, /* copyright sign */
{ "ª", 170 }, /* feminine ordinal indicator */
{ "«", 171 }, /* left-pointing guillemet */
{ "¬", 172 }, /* not sign */
{ "", 173 }, /* soft (discretionary) hyphen */
{ "®", 174 }, /* registered sign */
{ "¯", 175 }, /* macron = overline */
{ "°", 176 }, /* degree sign */
{ "±", 177 }, /* plus-minus sign */
{ "²", 178 }, /* superscript two */
{ "³", 179 }, /* superscript three */
{ "´", 180 }, /* acute accent = spacing acute */
{ "µ", 181 }, /* micro sign */
{ "¶", 182 }, /* pilcrow (paragraph) sign */
{ "·", 183 }, /* middle dot (georgian comma) */
{ "¸", 184 }, /* cedilla = spacing cedilla */
{ "¹", 185 }, /* superscript one */
{ "º", 186 }, /* masculine ordinal indicator */
{ "»", 187 }, /* right pointing guillemet */
{ "¼", 188 }, /* 1/4 */
{ "½", 189 }, /* 1/2 */
{ "¾", 190 }, /* 3/4 */
{ "¿", 191 }, /* inverted question mark */
{ "À", 192 }, /* cap A with grave */
{ "Á", 193 }, /* cap A with acute */
{ "Â", 194 }, /* cap A with circumflex */
{ "Ã", 195 }, /* cap A with tilde */
{ "Ä", 196 }, /* cap A with diaeresis */
{ "Å", 197 }, /* cap A with ring */
{ "Æ", 198 }, /* cap AE ligature */
{ "Ç", 199 }, /* cap C with cedilla */
{ "È", 200 }, /* cap E with grave */
{ "É", 201 }, /* cap E with acute */
{ "Ê", 202 }, /* cap E with circumflex */
{ "Ë", 203 }, /* cap E with diaeresis */
{ "Ì", 204 }, /* cap I with grave */
{ "Í", 205 }, /* cap I with acute */
{ "Î", 206 }, /* cap I with circumflex */
{ "Ï", 207 }, /* cap I with diaeresis */
{ "Ð", 208 }, /* cap letter ETH */
{ "Ñ", 209 }, /* cap N with tilde */
{ "Ò", 210 }, /* cap O with grave */
{ "Ó", 211 }, /* cap O with acute */
{ "Ô", 212 }, /* cap O with circumflex */
{ "Õ", 213 }, /* cap O with tilde */
{ "Ö", 214 }, /* cap O with diaeresis */
{ "×", 215 }, /* multiplication sign */
{ "Ø", 216 }, /* cap O with stroke */
{ "Ù", 217 }, /* cap U with grave */
{ "Ú", 218 }, /* cap U with acute */
{ "Û", 219 }, /* cap U with circumflex */
{ "Ü", 220 }, /* cap U with diaeresis */
{ "Ý", 221 }, /* cap Y with acute */
{ "Þ", 222 }, /* cap letter THORN */
{ "ß", 223 }, /* small sharp s = ess-zed */
{ "à", 224 }, /* small a with grave */
{ "á", 225 }, /* small a with acute */
{ "â", 226 }, /* small a with cirucmflex */
{ "ã", 227 }, /* small a with tilde */
{ "&amul;", 228 }, /* small a with diaeresis */
{ "å", 229 }, /* small a with ring */
{ "æ", 230 }, /* small ligature ae */
{ "ç", 231 }, /* small c with cedilla */
{ "è", 232 }, /* small e with grave */
{ "é", 233 }, /* small e with acute */
{ "ê", 234 }, /* small e with circumflex */
{ "&emul;", 235 }, /* small e with diaeresis */
{ "ì", 236 }, /* small i with grave */
{ "í", 237 }, /* small i with acute */
{ "î", 238 }, /* small i with circumflex */
{ "ï", 239 }, /* small i with diaeresis */
{ "ð", 240 }, /* latin small letter eth */
{ "ñ", 241 }, /* small n with tilde */
{ "ò", 242 }, /* small o with grave */
{ "ó", 243 }, /* small o with acute */
{ "ô", 244 }, /* small o with circumflex */
{ "õ", 245 }, /* small o with tilde */
{ "ö", 246 }, /* small o with diaeresis */
{ "÷", 247 }, /* division sign */
{ "ø", 248 }, /* small o with slash */
{ "ù", 249 }, /* small u with grave */
{ "ú", 250 }, /* small u with acute */
{ "û", 251 }, /* small u with circumflex */
{ "ü", 252 }, /* small u with diaeresis */
{ "ý", 253 }, /* small y with acute */
{ "þ", 254 }, /* latin small letter thorn */
{ "ÿ", 255 }, /* small y with diaeresis */
};
static unsigned int
decode_html_entity( char *s, unsigned int *pi, int *err )
{
int nhtml_entities = sizeof( html_entities ) / sizeof( entities );
char *e;
int i, n=-1, len;
for ( i=0; i