How to Perform a Comprehensive Code Security Audit

Preface

For many security researchers, fuzzing is an important vulnerability discovery method, but each method has its limitations. First, in order to perform fuzzing effectively, it is necessary to accurately find the attack surface and perform automated testing on this basis. The process of finding the attack surface is inseparable from code auditing. Secondly, after the automated test finds the crash, the analysis and report writing process requires manual auditing to understand the code logic and the root cause of the vulnerability. Finally, for memory-safe languages such as Java or Go, the application scenarios of fuzzing are even more limited, and complex logic design problems can only be discovered through code auditing.

This article is a methodological summary of code auditing. There is no restriction on the language of the specific code, which can be C/C++, Java, or PHP. Source, Sink, and automated analysis tools unique to specific languages are not within the scope of this article. Interested friends can pay attention to the subsequent article on code security auditing techniques.

Make plans before you act

Sun Tzu's Art of War says: Make plans before you act. There is also a famous saying in modern management: "If you can't measure it, you can't manage it."

This does not mean that we should ignore complex security issues and only look at simple vulnerabilities. The key is to clarify your audit objectives in advance. There are two aspects to an audit task: the auditor's own time investment cost and the complexity of the application to be audited. The combination of the two can effectively evaluate the expected output of this audit.

Applications to be audited are generally divided into the following categories based on access rights:

Source Code : We only have the target source code, which usually does not include a complete compilation and testing environment, and due to the lack of necessary key dependent components, it is often impossible to build a runnable program. In this case, we can only use static analysis to audit.

Binary Programs : We only have the binary files of the target application, such as APK, EXE, jar package or IoT system firmware. In this case, auditing is usually performed through dynamic analysis and reverse engineering.

Source Code / Binary Programs : We have access to both the source code of the target application and a runnable binary program, which provides the most advantageous access rights for security audits. Usually the target is open source software, including a complete build environment and dependencies.

Black Box : We have neither the target source code nor executable binary programs, so we can only perform blind testing through external interfaces. This is more common in Web applications.

This article mainly focuses on code security audits with source code. Of course, some of the strategies and methods used in source code audits are also applicable to other types of applications. When the source code is accessible, an obvious metric is to evaluate the workload by the number of lines of code, although this indicator does not perfectly represent the complexity of the application. After all, the complexity of 1,000 lines of business code is different from that of 1,000 lines of compiler code.

A code auditor can audit about 100 to 1,000 lines of code in an hour, depending on the auditor's experience and understanding of the code. For individuals, the best way to evaluate audit efficiency is to keep a record of your audit time for different components, which can not only help you better understand your own rhythm, but also provide a time reference for subsequent code audit plans.

There are many reasons that affect the speed of code auditing, such as:

Code language: For memory-unsafe languages such as C/C++, more attention needs to be paid to the underlying details; while memory-safe languages such as Java and Python focus more on the upper-level logic implementation.

Coding style: Projects with clean and well-commented code usually take less time to audit than other projects;

It is worth mentioning that although there is a positive correlation between the amount of code and the audit time, as the amount of audit code for a project increases, the efficiency of the audit will also increase, because at this time, there is a deeper understanding of the project, and the time to audit 100,000 lines of code is usually not twice that of 50,000 lines of code. Therefore, when formulating an audit plan and time investment, the above-mentioned related factors need to be carefully considered.

Know when to stop and you will gain something

In the next section, we will introduce some specific strategies and techniques for code security auditing. In practice, some methods may be more effective than others, but experience tells us that it is better to use multiple audit strategies and switch audit methods periodically for a variety of reasons:

You can only maintain a high level of mental focus for a limited time.

Variety helps you maintain discipline and passion.

Different vulnerability types may be easier to spot from other perspectives.

Different people have different ways of thinking.

From a global perspective, the code security audit approach is a simple three-step cycle:

Plan: Audit planning. Based on the existing information, determine the code audit strategy to be used in this phase, as well as the small audit goals, such as completing the audit of a certain module or file (directory), understanding the function of a certain structure, etc.

Work: Execute the audit according to the audit strategy formulated earlier, with the focus on keeping good audit records during the process.

Reflect: After completing the audit of this phase, reflect on whether you have used your time effectively and whether you have deviated from the direction. Then adjust the audit plan for the next phase based on the experience learned from the previous audit, such as re-dividing the structure, focusing on security-related sub-modules, etc.

Therefore, we can often find some simple security issues in the early stages of code auditing, and in the later stages, as we gain a deeper understanding of the application, we can discover more complex logical vulnerabilities and even design flaws.

Everything has its own law

This section introduces some specific code audit strategies. If code audit is a war, then it must pay attention to strategy and tactics. This may just be the experience of predecessors, but it is indeed an effective methodology.

There are many code audit strategies, but each strategy has similar attributes that represent the characteristics of this strategy. For example:

Starting point: the starting point of code tracing.

End point: The goal of the strategy, or the point where the tracking code ends.

Method: code tracing method, tracing data flow, control flow, tracing direction is forward tracing or reverse tracing.

Objective: What type of vulnerabilities does this audit policy target?

Difficulty: Indicates the difficulty of executing the audit strategy, generally from 1 star to 5 stars, indicating increasing difficulty.

Speed: Indicates the execution speed of the audit policy, also from 1 star to 5 stars, from slow to fast.

Understanding: Indicates the code understanding brought by the audit strategy. Generally, strategies that bring more understanding are more difficult, but they can also help researchers find more complex vulnerabilities.

There are also corresponding advantages and disadvantages. Below are some specific audit strategies.

Strategy 1: Top-down

The first code audit strategy is to find vulnerabilities by directly analyzing the code. This method usually requires reading and understanding the code, which requires more concentration, but also brings a deeper understanding of the code. Of course, a head-on confrontation is not a one-shot deal, and can be further divided into the following execution strategies.

Data analysis

The first is data flow analysis, which tracks the user's (malicious) input data to find potential vulnerabilities.

key	val
starting point	Data entry points, such as function parameters, environment variables, etc.
end	The final vulnerability trigger point, such as privilege escalation, injection, memory corruption, etc.
method	Forward analysis, data flow sensitivity, control flow sensitivity
Target	Discover security vulnerabilities that can be triggered by malicious input
Difficulty	★★★★
speed	★
understand	★★★★

Data analysis is probably the code audit method that most people think of. Its core is to start from malicious input and conduct forward audits through control flow in the code, usually supplemented by limited data flow analysis. During the audit process, a series of propagated data is recorded and tracked, and vulnerabilities are located in conjunction with security boundary analysis and common vulnerability types.

This method is an effective way to analyze code, but it requires some additional experience to determine which modules and functions should be followed, otherwise the path will branch and explode in a short period of time. It is also easy to lose focus and miss some important branches during a period of auditing, and then miss the vulnerability.

When facing projects in languages like Java or C++, data analysis methods are often more difficult because tracing the original input usually goes through multiple intermediate classes, causing you to open more than a dozen files before reaching the code that actually processes the data. In this case, it is best to have the help of design documents to build a relatively complete threat model, otherwise it is better to use other strategies to understand the system first.

Module Analysis

An implicit assumption of module analysis is that code modules are divided into files, so analyzing a module actually means analyzing the corresponding source file.

key	val
starting point	Beginning of the file
end	End of file
method	Forward analysis, data flow insensitive, control flow insensitive
Target	Read every function in the module and only document potential problems
Difficulty	★★★★★
speed	★★
understand	★★★★★

The process of module analysis is basically to read a source code file from beginning to end, without tracking the external functions called in it, and not caring about the references of the current function, but only recording the current problems. You may think this strategy is rough, but in fact many senior code auditors like this method. For example, a former NSA security consultant will use this strategy first when auditing a new code repository, find a similar utiltool directory and read the glue code line by line to form a preliminary understanding of the project's code style.

The advantage of module analysis is that you can quickly understand the code style of the application. For highly cohesive modules, you can analyze them without going out of the file, and you can have a preliminary understanding of the internal implementation of the module. But the disadvantages are also obvious. This analysis method is more laborious, and the brain is easily fatigued after a long period of continuous analysis. In addition, recording all potential problems during the audit process is also a tedious task. After several hours of recording, it is difficult to have the passion to continue. Therefore, if you feel distracted while executing this strategy, it is best to take a break first, or switch to other less intense audit strategies.

Citation Analysis

Reference analysis is similar to module analysis, the main difference is that we focus on the class or structure implementation in object-oriented code.

key	val
starting point	An object implements
end	All references to this object (xref)
method	Forward analysis, data flow insensitive, control flow insensitive
Target	Learn the interfaces and implementations of important objects, and find errors caused by the use of interfaces
Difficulty	★★★★
speed	★★
understand	★★★★★

For object-oriented languages, this audit strategy is more efficient than simple module analysis because objects are often highly cohesive. At the same time, this analysis process is less likely to lead to deviations in the audit process. However, as with the characteristics of module analysis, we also need to stay focused during the audit process, otherwise we may miss something.

Algorithm Analysis

After having a sufficient understanding of the application's system design and data structure, we can select some security-related algorithms and analyze their implementation.

key	val
starting point	The beginning of the algorithm
end	End of the algorithm
method	Forward analysis, data flow insensitive, control flow insensitive
Target	Analyze algorithm implementations and identify potential design and implementation issues
Difficulty	★★★★★
speed	★★
understand	★★★★★

The effectiveness of this audit strategy depends on the relevance of the algorithm we choose, so we need to have a certain understanding of the audit objectives before we can determine which algorithms and codes are critical. These critical algorithms usually involve the design of application security models or cryptographic implementations, such as the sessionId implementation algorithm in a Web application or the encryption verification algorithm customized by the application developer.

Strategy 2: Bottom-up

This category of strategies is the opposite of the above-mentioned positive strategies, and mainly starts from the underlying code that may cause the vulnerability. This type of strategy usually uses some automated analysis tools as an aid, and then traces back to verify the triggering path of the vulnerability.

Sensitive calls

Specify a series of sensitive calls and reverse analyze whether these calls constitute exploitable vulnerabilities. The simplest way is to specify sensitive functions or statements through regular expressions, and use text search tools to find potential vulnerabilities and verify them.

key	val
starting point	Potential vulnerability points
end	Arbitrary user-controllable input
method	Reverse analysis, data flow sensitivity, control flow sensitivity
Target	Given a list of potential vulnerabilities, analyze whether they can be triggered and exploited
Difficulty	★★
speed	★★★★
understand	★★

The advantage of this strategy is that it can achieve a high coverage rate for known vulnerability types, such as formatted strings, command injection, etc.; and this audit method is not very sanity-losing, which can effectively save and restore attention. Due to the existence of the vulnerability list, it is not easy to go astray, and the audit work can be steadily advanced.

Missing scan tool

In addition to manually specifying sensitive calls, we can also use automated static analysis tools to obtain potential vulnerability points. For example, Sonar, Fortify, etc. have relatively complete vulnerability classification lists.

key	val
starting point	Potential vulnerability points
end	Arbitrary user-controllable input
method	Reverse analysis, data flow sensitivity, control flow sensitivity
Target	Given a list of potential vulnerabilities, analyze whether they can be triggered and exploited
Difficulty	★
speed	★★★★
understand	★

The analysis of this strategy is similar to the sensitive call analysis process. Early static analysis tools were just simple text matching, but now there are many static analysis tools that can perform relatively accurate data flow and context analysis. Some require support from a complete build environment, such as CodeQL, while others can be analyzed with only source code, such as weggli, semgrep, etc.

A major drawback of these static analysis tools is that they often have false positives. If only a few of the thousand scan results are real vulnerabilities, they can often be easily overlooked by security auditors.

Interface Analysis

Sometimes the vulnerability does not occur solely in sensitive function calls, but is implemented in the code of a class or application function, such as some command execution intermediate functions or ORM database encapsulation interface.

key	val
starting point	Application object interface or function call
end	Arbitrary user-controllable input
method	Reverse analysis, data flow sensitivity, control flow sensitivity
Target	Given a list of potential vulnerabilities, analyze whether they can be triggered and exploited
Difficulty	★
speed	★★★★
understand	★

Some automated analysis tools can write custom rules for scanning, but in most cases, a simple grep/findstr command is all that is needed to find and filter vulnerabilities. This method requires a certain level of understanding of the code to be audited in order to know which are potential security-sensitive functions, and because we are only constantly searching and jumping in a shallow context, this audit strategy can only help us verify the vulnerability path, and it is of little help in understanding the code.

In addition to source code analysis, we can also use similar strategies for reverse analysis of binary applications to locate potential vulnerability patterns in assembly code. For example, for x86 programs, we can MOVSXsearch for potential integer overflow vulnerabilities through instructions, etc., but this is off topic.

Strategy 3: See the big picture from the small details

After auditing the first two strategies, we have a certain degree of understanding of the code itself. At this time, we can take a step back and audit the overall design and implementation of the application. This strategy mainly focuses on the upper-level design flaws and logic problems, so it can often find serious vulnerabilities that are hidden deeper.

System Modeling

The general development process is to complete the top-level design and scheduling first, and then divide the modules into specific coding implementations. However, for security audits, this process can be reversed, that is, after completing the analysis of the specific implementation, reversely infer the overall design ideas, and then find some components that have not been touched based on this inferred idea.

key	val
starting point	Starting point of the module to be audited
end	Security Vulnerabilities
method	Adapt to changing circumstances
Target	Restore the abstract behavior of the module through behavioral modeling and find potential logic and functional loopholes
Difficulty	★★★★
speed	★★
understand	★★★★★

This audit strategy is a great choice if you want to have a deeper understanding of the target system; but it also means that the audit speed of this method will not be too fast, because basically we are reversing the original design architecture from the implementation details.

It is worth mentioning that we usually only need to reverse model some core modules, such as the application's security subsystem, input filtering module or other widely used core components. In the process of modeling, we actually put ourselves in the position of developers or architects to reconsider the design of the module, so this is also the best strategy to discover logical vulnerabilities through specific code.

When faced with large codes, it is almost impossible for our brains to digest the entire application structure at once, so we need to prune the code. The specific approach is to first make some assumptions about the audit code, and then verify these assumptions in actual tests. Regardless of whether the assumptions are correct or not, we can ultimately deepen our understanding and knowledge of the system through testing.

Security Boundary

The goal of this audit strategy is to restore the security boundaries preset by developers or security architects from the code implementation, so as to further audit the restored security boundaries and build a threat model of actual attacks.

key	val
starting point	All safety-related checksums and check codes
end	Security Vulnerabilities
method	Adapt to changing circumstances
Target	Use known security-related codes to infer the security boundaries of the target design
Difficulty	★★★★
speed	★★★
understand	★★★★★

A specific method is to collect and record the security verification code snippets in the code, then classify and organize these security boundary checks, and finally summarize the original security level division. These original security verifications are an important source of information for modeling the application security boundary. The advantage of this strategy is that it allows us to focus on security-related code areas and build a more complete design architecture.

Design Verification

If we have the design documents or specification manual of the target application, then an intuitive audit strategy is to find undefined behaviors or conflicts by comparing the specifications and implementation code.

key	val
starting point	Starting point of the module
end	End of module
method	Forward analysis, control flow sensitivity, data flow sensitivity
Target	Discover vulnerabilities in code implementation that differ from the design
Difficulty	★★★
speed	★★★
understand	★★★

Although we don't have such detailed documentation most of the time, we can still use this strategy to discover some vulnerabilities in the design implementation. By focusing on the gray areas in the code implementation, or the critical processing of some conditional branches, we can also roughly infer which behaviors are not defined in the documentation. In short, our goal is to first infer and restore the main functions and normal behaviors of the target module, and then focus on auditing the critical situations.

Auditing skills

We have previously introduced some common strategies for code security auditing, which are mainly used to provide a general strategic direction for code auditing. This section mainly introduces some tips used in actual code auditing work as a supplement to the above code auditing strategies.

Reading Order

When auditing a module code, we can read it by tracking the data flow or tracking the control flow. So which method is better? The answer is that we don’t care about the data flow or the control flow, but only focus on the implementation of this module. Because years of experience have taught us that mental power is an important factor affecting the efficiency of code auditing, and jumping back and forth between different modules often consumes mental energy. For example, in some complex project codes, looking for the implementation of a function will constantly open new files, which will constantly generate new problems that need to be solved. In the process of continuous tracking, people often get lost in the ocean of curiosity and forget the original audit task.

If you really want to get to the bottom of the matter, it is recommended that you make a mark on the audit record first and then conduct an in-depth analysis after completing the audit of the current module.

Revisiting the old place

Schopenhauer once said that important books should be read twice in a row. Because when you read it the second time, you already know the ending, so you can truly understand the beginning. Another reason is that when you read it the second time, you have a different mood and may look at the problem from another perspective.

The same is true for code auditing. Usually, we need to read a piece of code multiple times to find all types of vulnerabilities in it. For example, in the first audit, we may focus on integer overflow, memory management, or security vulnerabilities related to formatted strings; the second audit focuses on functional implementation, such as return value checking and some API calls that are easy to understand (such as strncpy, strlcpy); the third audit focuses on potential synchronization and competition issues between threads or TOCTOU resource access management, and so on.

There is no standard for how many times a code should be audited and read. The specific situation needs to be judged according to the specific running context of the code. For example, for single-threaded code, you don’t need to worry about thread synchronization. But in any case, you should read the key code at least twice, because if you only read it once, it is easy to miss important code paths.

Audit Notes

Whether or not to take notes during the audit process actually varies from person to person, but experience shows that keeping notes is very helpful.

Auditing is auditing. If you audit without planning, you will be lost. If you plan without auditing, you will be in danger. On the one hand, structured notes can help you evaluate the code audit coverage and accumulate audit experience; on the other hand, it can also facilitate a quick review of previous work when you continue to conduct in-depth audits in the future. In addition, whether as an employee or an independent security consultant, we all need to output audit reports or vulnerability reports to our boss or Party A, and these materials are important data sources.

Idea List

We may have many ideas during the code audit process. For example, when auditing a state machine, we may think that some abnormal state switching may be introduced, resulting in abnormal processing, or some user-controllable data may enter the branch of other modules, which may cause the corresponding module verification error, etc. These ideas cannot be verified one by one during the audit process, otherwise it will deviate from the initial audit plan. Therefore, we also need to maintain a list of potential vulnerabilities, record the above ideas, and indicate the points in the system that "may" have vulnerabilities or be exploited by attackers. This list does not need to be very detailed, it may just be a guess or inspiration. After recording it, you can wait until you have time to continue in-depth analysis and verification later, which can not only ensure the orderly progress of the audit work, but also ensure that your ideas are not forgotten.

Summarize

One of the early focuses of code security audit is to evaluate your own audit speed (know yourself) and the amount and complexity of code (know your opponent), so that your audit work can be quantified and manageable, thereby steadily advancing the audit progress; in the code audit process, use the three-step cycle process to continuously increase the audit coverage and improve your understanding of the code structure. After each stage of the audit, you must promptly record what you have learned and what you have thought, so that the work can be implemented. At the same time, you also need to switch audit strategies in a timely manner during this process, and use and respond to your limited attention and willpower reasonably.

Tutorial Boy