A couple of times in the course of my job, I’ve needed to tidy-up user input of postal addresses, usually when importing from an external database into a web application or taking input from a web form. It looks horrible when everything is in upper or lower case.
ucasewords() functions, but none of these does just what I needed. Googling found a few attempts to solve the same problem, and I’ve probably pinched bits of code from various places. If I’ve pinched some of yours, let me know and I’ll acknowledge your contribution. Hence the function below.
Here’s a quick run-down of what it does when you feed it a string…
Lines 4, 5 & 6 just strip white-space from either end of the string and remove any line breaks and newlines.
Line 7 breaks the string into words on space characters and the loop from line 8 to line 18 deals with each word, one at a time.
Line 9 does most of the work, converting the word first to lower case and then capitalising it, which is great for the vast majority of words, but what about Scottish and Irish names?
That’s what lines 11-17 deal with. Essentially, presented with a list of prefixes declared as
$specials at line 11 (which you can add to if needed), we then find any words that begin with those prefixes and capitalise the fragment of word following. Example – the word being processed arrives at line 12 as ‘Macdonald’, and the code goes through each of the ‘specials’ to see if our word starts with the prefix. It does in the case of ‘Mac’, so we capitalise ‘donald’ and put the two fragments back together as ‘MacDonald’.
But that messes up ‘Macclesfield’, making it ‘MacClesfield’, so lines 21-24 undo what we just did for another list of ‘specials’, which you can also add to if you wish.
Lines 26-29 just make some shorter words (are they conjunctions?) into lower case, so you and up with ‘Walton le Dale’ or ‘Stratford upon Avon’.
Then we put the text back together and return it at line 31. Easy.
Help yourself if you find this function useful. Any links back or other acknowledgements would be more than welcome, but aren’t necessary.
Leave a comment if you find any bugs, or you think you can improve my code.