Omeka-XML output parsing errors

In both admin and public views, the omeka-xml links generate XML Parsing errors, apparently due to OCRed text with weird symbols. Any ideas on how to get around that? DCXML works fine (because it doesn't contain the OCRed text).

Here's an example error:

XML Parsing Error: not well-formed<br />
Location: URL/items?output=omeka-xml<br />
Line Number 67, Column 1:National Organic Promotion, Research and<br />
^

The offending character in this case is Form Feed (FF) http://www.fileformat.info/info/unicode/char/000C/index.htm

The answer seems to be to html_escape the text for elementTextId="14977", but I have no idea how to go about doing so.

It looks like it is in /libraries/Omeka/Output/OmekaXml/AbstractOmekaXml.php in _createElement it used PHP's DOM to create the TextNode. So it's kinda buried in core. If a fix there does the trick, pull requests make us happy! Otherwise, it would mean going through with a plugin that overrides and produces its own fixed output.

Ohhh, fine. ;)

Pull request submitted.