Description
Is your feature request related to a problem? Please describe.
The bibtex file format is ill-defined when it comes to case sensitivity on keys. This is NOT a duplicate of #453, because that only talks about entry types.
There is a great deal of confusion about case sensitivity in bibtex. This applies to:
- entry types
- entry keys
- field keys
There is also some different use cases for bibtexparser. Some people want to use it to parse a bibtex file and get the same thing back when they print it out. Others want to use bibtexparser to parse in a way that is close to the behavior of some other tool to parse bibtex files (notably the bibtex
binary and the biblatex
package). This is part of the problem, because different processing tools will exhibit different behavior when they encounter keys that agree in lower case.
For example, consider the following LaTeX file:
\begin{filecontents}[overwrite]{the.bib}
@misc{CamelCase,
author = {Fester Bestertester},
Title = {What happens to this title?},
title = {This has a camel case key},
}
@misc{camelcase,
author = {Foster Bostertoster},
title = {This has a lower case key}
}
@misc{another,
aUTHOR = {Anthony Ordinary},
title = {Just a third entry},
}
\end{filecontents}
%%%%%%%%%%%%%%%%%%%
\documentclass{article}
% uncomment this out and use biber to see the difference. You will have to remove main.bbl and main.aux first.
%\usepackage{biblatex}
\usepackage{hyperref}
\IfPackageLoadedTF{biblatex}{\addbibresource{the.bib}}{\bibliographystyle{alpha}}
\begin{document}
I don't have much to say.
\cite{camelcase} and \cite{CamelCase} and \cite{another}.
\IfPackageLoadedTF{biblatex}{\printbibliography}{\bibliography{the}}
\end{document}
This example can be used to illustrate the difference between bibtex
and biblatex
. If you process this with the bibtex binary, it produces two warnings from bibtex
:
Case mismatch error between cite keys CamelCase and camelcase
---line 7 of file main.aux
: \citation{CamelCase
: }
I'm skipping whatever remains of this command
Database file #1: the.bib
Warning--I'm ignoring camelcase's extra "title" field
--line 8 of file the.bib
Repeated entry---line 11 of file the.bib
: @misc{camelcase
: ,
I'm skipping whatever remains of this entry
If you view the PDF, it took the first Title
field and dropped the second title
field. It also dropped the second camelcase
entry, producing an undefined reference. Hence you may consider the bibtex
binary to treat both entry keys and field keys as case-insensitive. From my observation of author behavior, about 90% use bibtex, and maybe 10% use biblatex. Since the bibtex file format was original bundled to the bibtex
binary, I consider this to be the proper interpretation of case-sensitivity but others may disagree.
Now consider the case of biblatex
. Uncomment the line to load biblatex, remove main.aux and main.bbl, and run pdflatex main;biber main;pdflatex main;pdflatex main
. The resulting PDF file contains three references, and the first reference takes the second title "This has a camel case key"
.
The decades-long problem here is that the syntax for original bibtex file format was never really defined (and it's still on version 0.99d). There are various tools to parse and handle them, but they have different behavior because they interpret the file format differently. You could argue that both bibtex
and biblatex
treat entry keys and field keys as lower case, but they have different behavior when they encounter keys that have the same lower case. Perhaps other tools have their own weird behavior based on their own interpretation of the incomplete bibtex file format.
I came across this problem because I was using bibtexparser
to produce an HTML format for the bibtex entries, and I wanted our system to emulate the behavior of both biblatex
and bibtex
.
The solutions that I came to:
- I wrote middleware to convert entry types to lower case, and both
bibtex
andbiblatex
do the same. That way it's easier to decide how to format the entries. The only reason I can see to preserve this is if thebibtexparser
user is expecting to see the same thing after parsing and writing out again. - Because of the behavior of
\cite
is case-sensitive, I decided not to convert entry keys to lower case. It appears thatbibtexparser.parse_string
does not check case of keys, and only declares a duplicate if the keys match in their original case. The second and subsequent entries with the same key are kept asDuplicateBlockKeyBlock
s but are not treated asentries
. This is not the same behavior of thebibtex
tool, which drops entries if the lower case key is the same as something already seen. It is consistent with how biber parses the entries. - I wrote middleware to convert field keys to lower case (it's easier to look them up that way, and both
biblatex
andbibtex
treat them as such). There is a question as to whether to take the first or last field encountered when there are duplicates, and it depends on whether you are trying to mimicbibtex
orbiblatex
(or something else). I see no reason to keep both 'title' and 'Title' field keys, but this depends on the use case. I use a flag in the constructor to choose between "keep all", "keep first", or "keep last".
We are using bibtexparser
in a system to process latex+bibtex that is uploaded by authors. Our system uses bibexport
to extract the entries that are actually cited, and this uses the bibtex
binary in the script. This tool only works if the authors use the bibtex
tool, since it looks in the .aux
file for \bibcite
. In order to get around this for authors who use biblatex
, our system creates an artificial .aux
file that looks like it was produced by the bibtex
tool, and we process that with bibexport
so that it can extract the entries. Of course biber
and bibtex
treat duplicate keys differently, so this will fail if authors depend on the biber
behavior to save entries with keys that collide in lower case with others.
Describe the solution you'd like
The bottom line here is that software tools to handle the bibtex file format are inconsistent on how they treat keys. It seems useful to offer options for bibtexparser
to emulate the behavior of other tools that process the bibtex file format. This can be customized by the use of middleware, and it might be useful to have additional standard middleware classes to support the different behavior required. It also seems like it's long overdue for a bibtex file format replacement. There are too many nonstandard entry types and field types. It's probably too late to fix the definition of the bibtex file format unless we add something like @version
at the beginning of the file to say what tools the file is intended to be processed with. I don't think that's the job of bibtexparser though unless it is used in a tool to replace the bibtex or biber tools
I would be willing to contribute a PR to offer other middleware to handle these cases
.
Activity