I've been playing around with Reverend a bit on getting it to correctly guess appropriate "Sections" for issues on the Real issuetracker. What I did was that I downloaded all 140 issuetexts and their "Sections" attribute which is a list (that is often of length 1). From list dataset I did a loop over each text and the sections within it (skipped the default section General
) so something like this:
data = ({'sections':['General','Installation'],
'text':"bla bla bla..."}
{'sections':['Filter functions'],
'text':"Lorem ipsum foo bar..."}
...)
for item in data:
secs = [each for each item['sections'] if each != 'General']
for section in secs:
guesser.train(section, item['text'])