Python's configparser Module

Solution Found

I'm working on a Python program that checks git repositories on your hard drive to see if they have uncommitted code or are out of date with their remotes. If you're wondering why, it's because I have a number of repos on my drive, at least one of which I can update without even realizing I've done so. That would be /home/giles/.vim/, which I often edit on the fly while editing some other type of file - updating vim's behaviour. So I run the check program and it tells me my repositories' statuses, so that when I get to another machine I can pull down all the updates rather than realizing a change is uncommitted on another machine I can't currently access.

I want to have a list of repositories to check in a format that's easy to update and not embedded in the Python program. configparser (link to official docs), which is part of the Python Standard Library, looks like just the thing ... until you start finding out the details.

The type of config file it expects to be handed looks like this:

[t1]

Alice = i
Bobby Sue = i
Charlie = s

# Section t2 deleted
; This is also a comment

[t3]

Aaron
Ben
Charleen

(Fascinating: the Pelican blog software uses the Pygments lexer, which considers the singular key values broken. We'll get to that.)

This is modelled on Windows' way of writing configurations, divided into sections with a number of key-value pairs. Comments can begin with "#" or ";" but cannot be inline (ie. following a piece of data on a line). The last section has keys but no values: this is possible only if you use the allow_none_type option.

To parse and print this, we could use:

import configparser
config = configparser.ConfigParser(allow_no_value=True)
config.read("configtest")
print("DEBUG: read_config_new():")
for section in config.sections():
    for chunk1 in config[section]: # the [Repositories] section in the config
        print(chunk1 + " " + str(config[section][chunk1]))

This results in the following output:

alice i
bobby sue i
charlie s
aaron None
ben None
charleen None

The lesser problem is that it returns the configuration file data it retrieves not, as you'd expect, as a dictionary, but a "dictionary-like object" in the form of a list of two-tuples. I suppose this is to allow for the allow_none_type option, which allows the config to hold a set of - in effect - keys without values (section "[t3]" in the config file shown above).

The much greater problem - and it's a show-stopper - is that configparser lower-cases all the keys. The documentation is pleased to point out that that makes use of keys case-insensitive (so there's something more going on with that 'dictionary-like object' because Python is normally case-sensitive): print(config["t1"]["Bobby Sue"]) works fine (the response is "i"), but so does print(config["t1"]["bobby sue"]) and even print(config["t1"]["bOBBY sUE"]). All of which is fine if you know the keys you're looking for. But what if you want to iterate over an unpredictable set of keys - say, a set of directories on the hard drive? If the config file contains /home/giles/OpenWRT/ and we iterate over the keys, configparser returns /home/giles/openwrt/ which git cannot find because Unix's file system is case sensitive (I guess this would all work on Windows - since when does Python favour Windows behaviour over Unix?!).

I noticed that configparser doesn't mangle the values, only the keys. So it occurred to me that perhaps I could reverse the order - duplicate keys would be okay because it's not actually a dictionary, right?:

[t1]

i = Alice
i = Bobby Sue
s = Charlie

But no: configparser.DuplicateOptionError: While reading from 'configtest' [line 4]: option 'i' in section 't1' already exists. This is where the "dictionary-like" behaviour comes in.

I had already written my own simple config parser before I realized configparser existed. I got excited about it because it's always better to not re-invent the wheel, right? And it allowed sections, so I could have a "Colours" section to allow the user to assign their own colours to the output, not just list directories. Unfortunately, I worked on the "Colours" section first, so I didn't immediately realise the full implications of the lower-casing behaviour. So now I'm looking at having to back out about 10 hours of work.

Solution

I thought "casing should be an option, not a requirement." And I did read the docs, although apparently not entirely, not with enough knowledge. The answer - the option - is there, but only visible to someone already very familiar with Python. So it was thanks to the ever-helpful stackoverflow that I actually found my solution, although I still don't understand it. Maybe I needed to rage at the problem a bit until I finally figured out the right search to find the answer. stackoverflow is answering the same question for Python 2, but the solution more or less applies intact to Python 3:

config = configparser.ConfigParser()
config.optionxform = str

(The "optionxform" needs to be done before you read the configuration file in.) Knowing to look for "optionxform," a re-reading of the docs provides another solution:

config = configparser.ConfigParser()
config.optionxform = lambda option: option

which I understand even less than the stackexchange solution. Both work for me - they make keys case sensitive. But I don't know why or what other side-effects the latter might be creating.