Joern for Beginners: A How-To Guide for Source Code Analysis
This article introduces the use of open source Joern for vulnerability mining, discussing specific tools and methods for identifying vulnerabilities in code security audits.
Joern 101
Joern is an open source code analysis platform that can convert source code into a Code Property Graph (CPG) through a variety of different front ends, and then query and analyze the CPG through Joern's built-in query syntax. If readers are familiar with CodeQL , they can think of Joern as an open source version of CodeQL.
Joern supports different codes through different front-end engines, such as using CDT to support fuzzing parsing of C/C++ codes, using Ghidra to support binary file parsing, using Soot to support Java bytecode parsing, etc. The maturity of different front-ends varies, as shown in the following table:
Name | Built with | Maturity |
---|---|---|
C/C++ | Eclipse CDT | Very High |
Java | JavaParser | Very High |
JavaScript | GraalVM | High |
Python | JavaCC | High |
x86/x64 | Ghidra | High |
JVM Bytecode | Soot | Medium |
Kotlin | IntelliJ PSI | Medium |
PHP | PHP-Parser | Medium |
Go | go.parser | Medium |
Ruby | ANTLR | Medium-Low |
Swift | SwiftSyntax | Medium |
C# | Roslyn | Medium-Low |
Joern-cli
git clone https://github.com/joernio/joern
cd joern
sbt stage
package demo;
import a.b.c.Foo;
public class Hello {
static Foo foo = new Foo();
public static void main(String[] args) {
String data = foo.bar(args[0]);
Runtime.exec(data);
}
}
To build the source code file into a code property graph database (cpg), you can use the joern-parse script:
- The language selected is javasrc instead of java, which is the front end of Soot.
- You can use joern-parse --list-languagesto list all supported front ends (languages).
- Hello.javaIt contains external classes a.b.c.Foo, so it cannot actually compile, but cpg can be generated correctly.
def source = cpg.method("main").parameter
def sink = cpg.call("exec").argument
sink.reachableByFlows(source).p
joern> help
val res8: Helper = Welcome to the interactive help system. Below you find
a table of all available top-level commands. To get
more detailed help on a specific command, just type
`help.<command>`.
Try `help.importCode` to begin with.
┌────────────────┬────────────────────────────────────────────────┬─────────────────────────┐
│command │description │example │
├────────────────┼────────────────────────────────────────────────┼─────────────────────────┤
│close │Close project by name │close(projectName) │
│cpg │CPG of the active project │cpg.method.l │
│delete │Close and remove project from disk │delete(projectName) │
│exit │Exit the REPL │ │
│importCode │Create new project from code │importCode("example.jar")│
│importCpg │Create new project from existing CPG │importCpg("cpg.bin.zip") │
│open │Open project by name │open("projectName") │
│openForInputPath│Open project for input path │ │
│project │Currently active project │project │
│run │Run analyzer on active CPG │run.securityprofile │
│save │Write all changes to disk │save │
│switchWorkspace │Close current workspace and open a different one│ │
│workspace │Access to the workspace directory │workspace │
└────────────────┴────────────────────────────────────────────────┴─────────────────────────┘
joern> cpg.help
...
Steps
- cpg.method: represents all method nodes.
- cpg.parameter: represents the parameters of all methods.
- cpg.method("main").parameter: Represents the parameters of all methods named main.
- cpg.typeDecl("Foo").method: represents all methods in all classes named Foo.
Scala 101
HelloWorld
object hello {
def main(args: Array[String]) = {
println("Hello, World!")
}
}
@main def hello() = println("Hello, World!")
scalac hello.scala
scala hello
Grammar Speedrun
var a = 1
val b = List(1, 2)
val c = 1 to 5 // Iterator
def fn(x: Any) = println(x)
def fn1(x: Int) = { x * x }
def fn2 = 1 to 5
Note the difference def a = 1 to 5between and val a = 1 to 5. Both return Iterator objects, but the former returns a new Iterator each time, while the latter points to the same Iterator, so the latter can only be traversed once.
def it = 1 to 5
it.map(x => x * 2)
// For a single function parameter, the parentheses can be omitted
it.map { x =>
val y = x * 2
println(y)
y
}
// You can use `_` instead of a single parameter:
it.map(_ * 2)
case class Pair(x: Int, y: Int) {
def plus(other: Pair): Int = x + other.x + y + other.y
}
val p1 = Pair(1, 2)
val p2 = Pair(3, 4)
val sum = p1 plus p2 // Use infix syntax
// Equivalent to
val sum = p1.plus(p2)
5 + 3
5.+(3)
(1 to 5).toList()
// Generally written as
(1 to 5).toList
trait MyTrait {
def method(): Unit = println("This is a trait method.")
}
case class MyCaseClass(name: String, age: Int)
object MyObject {
def method(): Unit = println("yeah.")
}
sbt
ThisBuild / scalaVersion := "2.13.12"
ThisBuild / organization := "com.example"
lazy val hello = project
.in(file("."))
.settings(
name := "Hello",
libraryDependencies += "org.scala-lang" %% "toolkit-test" % "0.1.7" % Test
)
sbt new scala/scala-seed.g8
sbt compile
sbt test
sbt clean run
Practical Sharing
joern
joern> importCode("vuln-spring")
joern> importCode("xiaomi.apk")
Web Vulnerability Mining
cpg.annotation.where(_.name(".*Mapping")).method.fullName.l
cpg.typeDecl.fullNameExact("org.springframework.jdbc.core.JdbcTemplate").method.l
- nameExact("xxx"): equal to
- name("xxx"): Regular expression matching
- nameNot("xxx"): Not equal to, also a regular match
cpg.method.fullName("org\\.springframework\\.jdbc.core\\.JdbcTemplate\\..*")
// or
cpg.method.filter(_.fullName.startsWith("org.springframework.jdbc.core.JdbcTemplate."))
// Or more usefully, call it directly
cpg.call.filter(_.methodFullName.startsWith("org.springframework.jdbc.core.JdbcTemplate.")).code.l
def source = cpg.annotation.where(_.name(".*Mapping")).method.parameter
def sink = cpg.call.filter(_.methodFullName.startsWith("org.springframework.jdbc.core.JdbcTemplate.")).argument(1)
sink.reachableByFlows(source).p
def prettyPath = (p: Path) => p.elements.map(
e => String.format("%-20s %s", e.label, e.code)
).mkString("\n")
sink.reachableByFlows(source).map(prettyPath).mkString("\n\n===\n")
METHOD_PARAMETER_IN @RequestParam(name = "password", required = true) String password
IDENTIFIER password
METHOD_PARAMETER_IN String password
IDENTIFIER password
CALL "SELECT * FROM users WHERE USERNAME=\"" + username + "\" AND PASSWORD=\"" + password + "\""
IDENTIFIER query
Android Vulnerability Discovery
jimple2cpg -J-Xmx30g --android $ANDROID_HOME/platforms/android-34/android.jar large.apk -o large.cpg
cpg.typeDecl.fullNameExact("android.content.BroadcastReceiver").derivedTypeDeclTransitive.fullName.l
val baseCls = "android.content.BroadcastReceiver"
val receiverCls = cpg.typeDecl.fullNameExact(baseCls).derivedTypeDeclTransitive.fullName.l
def source = cpg.call.nameExact(Operators.alloc).filter(n => receiverCls.contains(n.typeFullName))
def sink = cpg.call.nameExact("registerReceiver").argument(1)
//Find data stream
sink.reachableBy(source)
BroadcastReceiver r = new FooReceiver();
this.registerReceiver(r, filter);
public class ApplicationImpl extends Application {
private AppReceiver mDumpReceiver = new AppReceiver();
@Override // android.app.Application
public void onCreate() {
super.onCreate();
IntentFilter intentFilter = new IntentFilter();
intentFilter.addAction("com.android.traceur.DumpReceiver");
registerReceiver(this.mDumpReceiver, intentFilter);
}
}
def fieldAccess = cpg.fieldAccess.filter(fa => receiverCls.contains(fa.typeFullName) || fa.typeFullName.equals(baseCls))
// Property reading nodes of all BroadCastReceiver and its subclass types
def fieldRead = fieldAccess.filter(fa => fa.argumentIndex == 2)
//Returns the actual BroadCastReceiver type property
val reads = sink.reachableBy(fieldRead).toList
// Find all attribute names, the result is List[List[String]]
val fields = reads.map(r => List(r.typeDecl.fullName.head, r.fieldIdentifier.canonicalName.head))
// Match all writes to the above field
def fieldWrite = fieldAccess
.filter(fa => fa.argumentIndex == 1)
.filter{fa =>
val li = fa.map( r =>
List(r.typeDecl.fullName.head, r.fieldIdentifier.canonicalName.head)
)
li.exists(fields.contains)
}
//Data flow query
fieldWrite.reachableByFlows(source).p
- fieldAccess.argumentIndexIndicates the parameter position of the attribute in the parent expression, 1 is for writing, similar r0.field = value, 2 is for reading, similar r1 = r0.field; this specification is not required, but it can slightly improve the speed when the amount of code is large
- fieldAccessContains access to all properties of type BroadCastReceiver and its subclasses. Note that receiverCls only contains subclasses, not BroadCastReceiverthe parent class itself, so it needs to be added to match polymorphic situations
- fieldsIt is to get all the attribute field names and store them in List format, including class name and field name. It can also be saved as a string.
- Then fieldWrite, the corresponding fields are matched and written. List.exists is used here to determine liwhether list contains fieldsthe elements in list;
joern> fieldWrite.reachableByFlows(source).p
val res131: List[String] = List(
"""
┌──────────────────┬──────────────────────────────────────────────────┬────┬────────┬────────────────────────────────┐
│nodeType │tracked │line│method │file │
├──────────────────┼──────────────────────────────────────────────────┼────┼────────┼────────────────────────────────┤
│Call │new com.android.traceur.MainFragment$10 │228 │onCreate│com.android.traceur.MainFragment│
│Identifier │$r17 = new com.android.traceur.MainFragment$10 │228 │onCreate│com.android.traceur.MainFragment│
│Identifier │$r17.com.android.traceur.MainFragment$10(r0) │228 │onCreate│com.android.traceur.MainFragment│
│MethodParameterIn │<init>(this, com.android.traceur.MainFragment $r1)│227 │<init> │com.android.traceur.MainFragment│
│MethodParameterOut│RET │227 │<init> │com.android.traceur.MainFragment│
│Identifier │$r17.com.android.traceur.MainFragment$10(r0) │228 │onCreate│com.android.traceur.MainFragment│
│Identifier │r0.mRefreshReceiver = $r17 │228 │onCreate│com.android.traceur.MainFragment│
│Call │r0.mRefreshReceiver = $r17 │228 │onCreate│com.android.traceur.MainFragment│
└──────────────────┴──────────────────────────────────────────────────┴────┴────────┴────────────────────────────────┘""",
"""
┌──────────────────┬─────────────────────────────────────────┬────┬──────┬───────────────────────────────────┐
│nodeType │tracked │line│method│file │
├──────────────────┼─────────────────────────────────────────┼────┼──────┼───────────────────────────────────┤
│Call │new com.android.traceur.AppReceiver │8 │<init>│com.android.traceur.ApplicationImpl│
│Identifier │$r1 = new com.android.traceur.AppReceiver│8 │<init>│com.android.traceur.ApplicationImpl│
│Identifier │$r1.com.android.traceur.AppReceiver() │8 │<init>│com.android.traceur.ApplicationImpl│
│MethodParameterIn │<init>(this) │38 │<init>│com.android.traceur.AppReceiver │
│MethodParameterOut│RET │38 │<init>│com.android.traceur.AppReceiver │
│Identifier │$r1.com.android.traceur.AppReceiver() │8 │<init>│com.android.traceur.ApplicationImpl│
│Identifier │r0.mDumpReceiver = $r1 │8 │<init>│com.android.traceur.ApplicationImpl│
│Call │r0.mDumpReceiver = $r1 │8 │<init>│com.android.traceur.ApplicationImpl│
└──────────────────┴─────────────────────────────────────────┴────┴──────┴───────────────────────────────────┘"""
)
public class MainFragment extends PreferenceFragment {
// ...
private BroadcastReceiver mRefreshReceiver;
@Override // androidx.preference.PreferenceFragment, android.app.Fragment
public void onCreate(Bundle bundle) {
// ...
this.mRefreshReceiver = new BroadcastReceiver() { // from class: com.android.traceur.MainFragment.10
@Override // android.content.BroadcastReceiver
public void onReceive(Context context, Intent intent) {
MainFragment.this.refreshUi();
}
};
}
@Override // androidx.preference.PreferenceFragment, android.app.Fragment
public void onStart() {
super.onStart();
// ...
getActivity().registerReceiver(this.mRefreshReceiver, new IntentFilter("com.android.traceur.REFRESH_TAGS"), 4);
Receiver.updateTracing(getContext());
}
}
Advanced Operations
Data Flow Semantics
def sink = cpg.call.filter(_.methodFullName.startsWith("org.springframework.jdbc.core.JdbcTemplate.")).argument(1)
However, to remain sound, Joern will treat external methods with no semantic definitions as able to propagate taint from all arguments, to all arguments including the return value.
import io.joern.dataflowengineoss.layers.dataflows.{OssDataFlow, OssDataFlowOptions}
import io.shiftleft.semanticcpg.layers.LayerCreatorContext
import io.joern.dataflowengineoss.semanticsloader.FlowSemantic
val extraFlows = List(
FlowSemantic.from(
"^path.*<module>\\.sanitizer$", // Method full name
List((1, 1)), // Flow mappings
regex = true // Interpret the method full name as a regex string
)
)
val context = new LayerCreatorContext(cpg)
val options = new OssDataFlowOptions(extraFlows = extraFlows)
new OssDataFlow(options).run(context)
- 1, -1Indicates that the first parameter data flow will be propagated to the return value
- 1, 2Indicates that the first parameter data flow will be propagated to the second parameter
- 1, 0Indicates that the first parameter data flow will be propagated to the example object (this)
- 1, 1Indicates that the first parameter data stream will be propagated to itself, usually used to indicate whether the data stream is interrupted, that is, used to specify the sanitizer
x = source()
foo(x) // "foo" 1->1 means that the data flow continues to propagate downward, otherwise it will be interrupted
sink(x)
"foo" 1->-1 2->3
"foo" 1 "param1"->2 3 -> 2 "param2"
"foo" PASSTHROUGH 0 -> 0
FlowSemantic("foo", List(PassThroughMapping))
import io.joern.dataflowengineoss.semanticsloader.{FlowSemantic, PassThroughMapping, Parser}
//Call constructor
val s = FlowSemantic("org\\.springframework.*", List(PassThroughMapping), true)
// Call static method
val s = FlowSemantic.from("org\\.springframework.*", List((1, -1)), true)
//Use Parser
val parser = Parser()
//Load single or multiple semantics from a string and return a list
val extraFlows = parser.parse(""" "foo" PASSTHROUGH 0 -> 0 """)
//Load from file
val extraFlows = parser.parseFile("semantics.txt")
Control Flow Enhancements
cpg.method.name("exec").repeat(_.caller)(_.emit.dedup).fullName.sorted
val methods = cpg.method
val node1 = methods.next
val node2 = methods.next
node1.addEdge(EdgeTypes.AST, node2)
joern> EdgeTypes.
ALIAS_OF AST CALL CDG CONTAINS IMPORTS PARAMETER_LINK REACHING_DEF SOURCE_FILE
ALL BINDS CAPTURE CFG DOMINATE INHERITS_FROM POINTS_TO RECEIVER TAGGED_BY
ARGUMENT BINDS_TO CAPTURED_BY CONDITION EVAL_TYPE IS_CALL_FOR_IMPORT POST_DOMINATE REF
class ThreadDemo {
public static void main(String args[]) throws Exception {
final String cmd = String.format("sh -c \"%s\"", args[0]);
Thread th = new Thread(new Runnable() {
@Override
public void run() {
System.out.println("Running in a new thread");
try {
System.out.println("return: " + Runtime.getRuntime().exec(cmd));
} catch(Exception ignore) {}
}
});
th.start();
Thread.sleep(1000);
}
}
def source = cpg.method.nameExact("main").parameter
def sink = cpg.call.nameExact("exec").argument
sink.reachableBy(source)
val call = cpg.call("start").head
val target = cpg.method("run").head
diffGraph.addEdge(call, target, EdgeTypes.CALL)
run.commit
joern> sink.reachableByFlows(source).p
val res4: List[String] = List(
"""
┌──────────────────┬────────────────────────────────────────────────────────────────────────────┬────┬──────┬────┐
│nodeType │tracked │line│method│file│
├──────────────────┼────────────────────────────────────────────────────────────────────────────┼────┼──────┼────┤
│MethodParameterIn │main(String[] args) │2 │main │ │
│Call │<operator>.arrayInitializer │3 │main │ │
│Call │<operator>.arrayInitializer │3 │main │ │
│Call │String.format("sh -c \"%s\"", args[0]) │3 │main │ │
│Identifier │String cmd = String.format("sh -c \"%s\"", args[0]) │3 │main │ │
│Identifier │new Runnable() { @Override public void run() { System.out.println("Running │4 │main │ │
│ │in a new thread"); try { System.out.println("return: " + │ │ │ │
│ │Runtime.getRuntime().exec(cmd)); } catch (Exception ignore) { } } } │ │ │ │
│MethodParameterIn │<init>(this, cmd) │4 │<init>│ │
│Identifier │this.cmd = cmd │4 │<init>│ │
│Call │this.cmd = cmd │4 │<init>│ │
│MethodParameterOut│RET │N/A │<init>│ │
│Identifier │new Runnable() { @Override public void run() { System.out.println("Running │4 │main │ │
│ │in a new thread"); try { System.out.println("return: " + │ │ │ │
│ │Runtime.getRuntime().exec(cmd)); } catch (Exception ignore) { } } } │ │ │ │
│Block │$obj0 │4 │main │ │
│Identifier │new Thread(new Runnable() { @Override public void run() { │4 │main │ │
│ │System.out.println("Running in a new thread"); try { │ │ │ │
│ │System.out.println("return: " + Runtime.getRuntime().exec(cmd)); } catch │ │ │ │
│ │(Exception ignore) { } } }) │ │ │ │
│Identifier │th.start() │13 │main │ │
│MethodParameterIn │run(this) │5 │run │ │
│Call │Runtime.getRuntime().exec(cmd) │9 │run │ │
└──────────────────┴────────────────────────────────────────────────────────────────────────────┴────┴──────┴────┘""",
CpgPass
class JumpPass(cpg: Cpg) extends ForkJoinParallelCpgPass[Method](cpg) {
override def generateParts(): Array[Method] =
cpg.method.toArray
override def runOnPart(diffGraph: DiffGraphBuilder, method: Method): Unit = {
method.ast
.filter(_.isInstanceOf[Call])
.map(_.asInstanceOf[Call])
.nameExact("<operator>.goto")
.where(_.argument.order(1).isLiteral)
.foreach { sourceCall =>
sourceCall.argument.order(1).code.l.headOption.flatMap(parseAddress) match {
case Some(destinationAddress) =>
method.ast.filter(_.isInstanceOf[Call]).lineNumber(destinationAddress).foreach { destination =>
diffGraph.addEdge(sourceCall, destination, EdgeTypes.CFG)
}
case _ => // Ignore for now
/* TODO: Ask ghidra to resolve addresses of JMPs */
}
}
}
private def parseAddress(address: String): Option[Int] = {
Try(Integer.parseInt(address.replaceFirst("0x", ""), 16)).toOption
}
}
import io.shiftleft.codepropertygraph.generated.{Cpg, EdgeTypes, PropertyNames}
import io.shiftleft.codepropertygraph.generated.nodes.{Call, Method, StoredNode, Type, TypeDecl}
import io.shiftleft.passes.CpgPass
class DemoPass(cpg: Cpg) extends CpgPass(cpg) {
override def run(diffGraph: DiffGraphBuilder): Unit = {
val call = cpg.call("start").head
val target = cpg.method("run").head
val targetNode = methodFullNameToNode(target.fullName).get
diffGraph.addEdge(call, targetNode, EdgeTypes.CALL)
println(s"Add Edge: $call -> $targetNode")
}
private def nodesWithFullName(x: String): Iterator[StoredNode] =
cpg.graph.nodesWithProperty(PropertyNames.FULL_NAME, x).cast[StoredNode]
private def methodFullNameToNode(x: String): Option[Method] =
nodesWithFullName(x).collectFirst { case x: Method => x }
}
new DemoPass(cpg).createAndApply()
run.commit
joern> project.availableOverlays
joern> project.appliedOverlays
res0: Seq[String] = IndexedSeq("base", "controlflow", "typerel", "callgraph", "dataflowOss")
other
cgp.help
cpg.method.help
cpg.typeDecl.help
- Node Type Steps - Reference Card - Official documentation
- queries.joern.io - Example query rules for joern-query
.collectAll[Call]
// Equivalent to
.collect { case x: Call => x }
// Equivalent to
.filter(_.isInstanceOf[Call]).map(_.asInstanceOf[Call])
Custom Step
implicit class MyMethodTraversals(method: Traversal[Method]) {
def fooStep = method.fullName(".*org.example.*").isPublic
}
cpg.method.fooStep
Visualization
joern> cpg.method("main").plotDot
plotDotAst plotDotCdg plotDotCfg plotDotCpg14 plotDotDdg plotDotPdg
- AST: Abstract Syntax Tree, Abstract Syntax Tree.
- CDG: Control Dependence Graph, which mainly includes the dependency relationships of control structures such as if/else.
- CFG: Control Flow Graph, all possible paths of program execution.
- DDG: Data Dependency Graph, data dependency graph, including dependency relationships.
- PDG: Program Dependence Graph, which includes control dependency and data dependency.
cpg.method("main").dotAst.head #> "/tmp/main.dot"
Post a Comment