Tuesday, June 8, 2010

Quick note on sorting john.pot files



I run john the ripper on multiple boxes at once by splitting up the input password hash files by type (NT, LanMan, DES, md5, etc.)

Once I've let the individual systems run for a while, I run reports using  john - -show. This means I need to combine the john.pot files from each system into a unique file on the system I'm running the reports on.

So, I ran the (fairly standard) commands:

$ cat john1.pot john2.pot john3.pot > combined.pot


$ cat combined.pot | sort | uniq > john.pot
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were `1234567892031276d66b123456789:user' and `abcdefghijklmnop4ab
38:l\4327367qrstuvwxyz'.

Oops! I wasn't expecting that command to bomb out like that. To fix the sort, do exactly as sort suggests. Type:

$ export LC_ALL='C'




What's going on here?

I found this reference webpage on Google. Here's the relevant bits:

LC_ALL
This variable determines the values for all locale categories. The value of the LC_ALL environment variable has precedence over any of the other environment variables starting with LC_ (LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG environment variable.
If the LANG environment variable is not set or is set to the empty string, the implementation-dependent default locale is used.

If the locale value is "C" or "POSIX", the POSIX locale is used and the standard utilities behave in accordance with the rules in POSIX Locale , for the associated category.