More flexible handling of case sensitivity in all keys

**Is your feature request related to a problem? Please describe.**
The bibtex file format is ill-defined when it comes to case sensitivity on keys. This is NOT a duplicate of #453, because that only talks about entry types. 

There is a great deal of confusion about case sensitivity in bibtex. This applies to:
1. entry types
2. entry keys
3. field keys

There is also some different use cases for bibtexparser. Some people want to use it to parse a bibtex file and get the same thing back when they print it out. Others want to use bibtexparser to parse in a way that is close to the behavior of some other tool to parse bibtex files (notably the `bibtex` binary and the `biblatex` package). This is part of the problem, because different processing tools will exhibit different behavior when they encounter keys that agree in lower case. 

For example, consider the following LaTeX file:
```
\begin{filecontents}[overwrite]{the.bib}
@misc{CamelCase,
  author = {Fester Bestertester},
  Title = {What happens to this title?},
  title = {This has a camel case key},
}
@misc{camelcase,
  author = {Foster Bostertoster},
  title = {This has a lower case key}
}
@misc{another, 
 aUTHOR = {Anthony Ordinary},
 title = {Just a third entry},
}
\end{filecontents}
%%%%%%%%%%%%%%%%%%%
\documentclass{article}
% uncomment this out and use biber to see the difference. You will have to remove main.bbl and main.aux first.
%\usepackage{biblatex}
\usepackage{hyperref}
\IfPackageLoadedTF{biblatex}{\addbibresource{the.bib}}{\bibliographystyle{alpha}}
\begin{document}
I don't have much to say.
\cite{camelcase} and \cite{CamelCase} and \cite{another}.
\IfPackageLoadedTF{biblatex}{\printbibliography}{\bibliography{the}}
\end{document}
```
This example can be used to illustrate the difference between `bibtex` and `biblatex`. If you process this with the bibtex binary, it produces two warnings from `bibtex`:
```
Case mismatch error between cite keys CamelCase and camelcase
---line 7 of file main.aux
 : \citation{CamelCase
 :                    }
I'm skipping whatever remains of this command
Database file #1: the.bib
Warning--I'm ignoring camelcase's extra "title" field
--line 8 of file the.bib
Repeated entry---line 11 of file the.bib
 : @misc{camelcase
 :                ,
I'm skipping whatever remains of this entry
```
If you view the PDF, it took the first `Title` field and dropped the second `title` field. It also dropped the second `camelcase` entry, producing an undefined reference. Hence you may consider the `bibtex` binary to treat both entry keys and field keys as case-insensitive. From my observation of author behavior, about 90% use bibtex, and maybe 10% use biblatex. Since the bibtex file format was original bundled to the `bibtex` binary, I consider this to be the proper interpretation of case-sensitivity but others may disagree.

Now consider the case of `biblatex`. Uncomment the line to load biblatex, remove main.aux and main.bbl, and run `pdflatex main;biber main;pdflatex main;pdflatex main`. The resulting PDF file contains _three_ references, and the first reference takes the _second_ title `"This has a camel case key"`.

The decades-long problem here is that the syntax for original bibtex file format was never really defined (and it's _still_ on version 0.99d). There are various tools to parse and handle them, but they have different behavior because they interpret the file format differently. You could argue that both `bibtex` and `biblatex` treat entry keys and field keys as lower case, but they have different behavior when they encounter keys that have the same lower case. Perhaps other tools have their own weird behavior based on their own interpretation of the incomplete bibtex file format.

I came across this problem because I was using `bibtexparser` to produce an HTML format for the bibtex entries, and I wanted our system to emulate the behavior of both `biblatex` and `bibtex`.

The solutions that I came to:
1. I wrote middleware to convert entry types to lower case, and both `bibtex` and `biblatex` do the same. That way it's easier to decide how to format the entries. The only reason I can see to preserve this is if the `bibtexparser` user is expecting to see the same thing after parsing and writing out again.
2. Because of the behavior of `\cite` is case-sensitive, I decided not to convert entry keys to lower case.  It appears that `bibtexparser.parse_string` does not check case of keys, and only declares a duplicate if the keys match in their original case. The second and subsequent entries with the same key are kept as `DuplicateBlockKeyBlock`s but are not treated as `entries`. This is not the same behavior of the `bibtex` tool, which drops entries if the lower case key is the same as something already seen. It is consistent with how biber parses the entries.
3. I wrote middleware to convert field keys to lower case (it's easier to look them up that way, and both `biblatex` and `bibtex` treat them as such). There is a question as to whether to take the first or last field encountered when there are duplicates, and it depends on whether you are trying to mimic `bibtex` or `biblatex` (or something else).  I see no reason to keep both 'title' and 'Title' field keys, but this depends on the use case. I use a flag in the constructor to choose between "keep all", "keep first", or "keep last".

We are using `bibtexparser` in a system to process latex+bibtex that is uploaded by authors. Our system uses `bibexport` to extract the entries that are actually cited, and this uses the `bibtex` binary in the script. This tool only works if the authors use the `bibtex` tool, since it looks in the `.aux` file for `\bibcite`. In order to get around this for authors who use `biblatex`, our system creates an artificial `.aux` file that looks like it was produced by the `bibtex` tool, and we process that with `bibexport` so that it can extract the entries. Of course `biber` and `bibtex` treat duplicate keys differently, so this will fail if authors depend on the `biber` behavior to save entries with keys that collide in lower case with others.
 
**Describe the solution you'd like**

The bottom line here is that software tools to handle the bibtex file format are inconsistent on how they treat keys. It seems useful to offer options for `bibtexparser` to emulate the behavior of other tools that process the bibtex file format. This can be customized by the use of middleware, and it might be useful to have additional standard middleware classes to support the different behavior required. It also seems like it's long overdue for a bibtex file format replacement. There are too many nonstandard entry types and field types. It's probably too late to fix the definition of the bibtex file format unless we add something like `@version` at the beginning of the file to say what tools the file is intended to be processed with. I don't think that's the job of bibtexparser though unless it is used in a tool to replace the bibtex or biber tools

I would be willing to contribute a PR to offer other middleware to handle these `cases`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible handling of case sensitivity in all keys #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development