There are a lot of components to automation of this kind of thing; in fact people have made entire companies and product lines around (as far as I can tell) essentially this problem:
However, I was fairly sure there had to already be something open source out there to use as a start. My initial googling wasn't too successful (a lot of things called licenses), but then I had the bright idea to add "Debian" to my search. Turns out there's a license analyzing script in one of their packages:
There is also:
Which looks kind of frightening but maybe useful.
The Debian script supports far fewer licenses the Fedora wiki page on this topic; however, it would probably be pretty useful to run over the whole source tree as a start; I bet you'd find a number of cases where things today are specified just as GPL but have some other stuff.
Moving more advanced from that, associate the wiki license list set with a list of fuzzy text segments combined with regular expressions.