Attacks such as Cross-Site Scripting, HTTP header injection, and SQL injection take advantage of weaknesses in the way some web applications handle incoming character strings. One technique for defending against injection vulnerabilities is to sanitize untrusted strings using encoding methods. These methods convert the reserved characters in a string to an inert representation which prevents unwanted side effects. However, encoding methods which are insufficiently thorough or improperly integrated into applications can pose a significant security risk.
This paper will outline an algorithm for identifying encoding methods through automated analysis of Java bytecode. The approach combines an efficient heuristic search with selective rebuilding and execution of likely candidates. This combination provides a scalable and accurate technique for identifying and profiling code that could constitute a serious weakness in an application.