Hack 82 Remove Color from Messages
Parsing messages is difficult when they contain
color characters. Make messages easier to store and parse by removing
these characters.
Whether you are trying to
parse messages on the fly or store them in a different format, you
will notice that people who use colored messages throw a monkey
wrench in the works. Adding color to messages means adding lots of
spurious control characters. These have to be removed for the message
to make sense to anything that isn't an IRC client.
If you take a raw message from an IRC channel and paste it directly
onto a web page, it will appear quite different from the colored
version in your IRC client. You will see no color at all. Instead,
you will see the message with some extra characters sprinkled along
it.
One particular situation in which it is useful to remove colors is
when you are running an artificial intelligence bot, which learns by
reading what other users send to the channel. Removing the special
color characters is essential here; otherwise, the bot will get
confused and end up speaking multicolored gibberish.
13.6.1 Simple Color Removal
Let's create some code to
remove simple colors. Simple colors are marked by the control
character 0x03 and are followed by one or two
digits. The number after the control character should be between 0
and 15 inclusive, but may contain an optional leading zero to bulk it
up to two digits. Most IRC clients treat any value (00-99) as a valid
color, although only 0-15 are clearly defined.
An optional background color may be specified by appending a comma to
the foreground color code. This is followed by another one- or
two-digit code to specify the background color. You must also take
this into account when you remove color codes from a message.
13.6.1.1 Perl solution
Using
regular expressions, this is a trivial
one-liner. The following line removes
simple coloring from the input:
$input =~ s/ \x03[0-9]{1,2}(,[0-9]{1,2})?//g;
13.6.1.2 Python solution
The
Python regular
expression module lets you apply the same replacement to a Python
variable:
import re
re.compile(" \x03[0-9]{1,2}(,[0-9]{1,2})?").sub("", input)
13.6.1.3 Java solution
Again, with
regular
expressions available in Java 1.4 and beyond, this is easy. To remove
simple coloring from the input, just do this:
input = input.replaceAll(" \u0003[0-9]{1,2}(,[0-9]{1,2})?", "");
13.6.1.4 Java Applet solution
All good Applets should run in Java 1.1, as
there is rarely any guarantee that an end user will have anything
more recent. Most browsers are supplied with a 1.1-compatible Virtual
Machine without the user having to apply any updates.
Being restricted to Java 1.1 makes the process of color removal much
more verbose. Although there are more lines of code, it is no less
efficient than using regular expressions—if they were
available!
This method can be used to remove simple coloring from within a Java
Applet:
// A rather long but efficient way of removing colors in Java 1.1.
public static String removeColors(String message) {
int length = message.length( );
StringBuffer buffer = new StringBuffer( );
int i = 0;
while (i < length) {
char ch = message.charAt(i);
if (ch == '\u0003') {
i++;
// Skip "x" or "xy" (foreground color).
if (i < length) {
ch = message.charAt(i);
if (Character.isDigit(ch)) {
i++;
if (i < length) {
ch = message.charAt(i);
if (Character.isDigit(ch)) {
i++;
}
}
// Now skip ",x" or ",xy" (background color).
if (i < length) {
ch = message.charAt(i);
if (ch == ',') {
i++;
if (i < length) {
ch = message.charAt(i);
if (Character.isDigit(ch)) {
i++;
if (i < length) {
ch = message.charAt(i);
if (Character.isDigit(ch)) {
i++;
}
}
}
else {
// Keep the comma.
i--;
}
}
else {
// Keep the comma.
i--;
}
}
}
}
}
}
else if (ch == '\u000f') {
i++;
}
else {
buffer.append(ch);
i++;
}
}
return buffer.toString( );
}
|
The PircBot API contains a
removeColors method in the
Colors class.
|
|
13.6.2 Hacking the Hack
If you have created an IRC bot that writes channel logs to a web
page, why not try to retain the information contained in the
coloring? One adventurous task would be to modify the methods here to
create colored HTML from a message instead of simply removing all
color. This is a much harder task than it first seems, so make sure
you think about it before you start implementing anything.
|