CodeQL is the source code analysis platform used by security researchers to automatically analyze errors. CodeQL can be done via an online platform on the LGTM.com query console .
CodeQL is based on a powerful query language called QL. Understanding QL helps us have a better view of reading comprehension as well as writing analytical code with CodeQL.
CodeQL is currently supporting the languages: C / C ++, C #, Go, Java, Python, Javascript, COBOL
About QL
QL is a powerful query language that underpins CodeQL. Queries written by CodeQL can find errors and detect different types of security vulnerabilities. To read examples of newly discovered security holes in the open souce project, go to the GitHub Security Lab .
QL is a logical query language, so it is built from logical structures. QL uses common logical connections (such as and
, or
, not
), quantification (like forall
, exists
), and other important logical concepts such as predicate .
QL also supports recursion and aggregation. This allows us to write complex recursive queries using simple QL syntax and use set functions like count
, sum
, average
directly.
To better understand QL go to About QL , QL language handbook
Basic syntax
The basic syntax of QL looks like SQL, but it is used a little differently.
A query is defined by the select
clause, which indicates the desired result at the output.
A simple query
1 2 | <span class="token keyword">select</span> <span class="token string">"Hello world"</span> |
The query simply produces a “Hello world” string.
The query is more complicated
1 2 3 4 | <span class="token keyword">from</span> <span class="token comment">/* ... variable declarations ... */</span> <span class="token keyword">where</span> <span class="token comment">/* ... logical formulas ... */</span> <span class="token keyword">select</span> <span class="token comment">/* ... expressions ... */</span> |
For example, the query result is 42
1 2 3 4 | <span class="token keyword">from</span> <span class="token keyword">int</span> x <span class="token punctuation">,</span> <span class="token keyword">int</span> y <span class="token keyword">where</span> x <span class="token operator">=</span> <span class="token number">6</span> <span class="token operator">and</span> y <span class="token operator">=</span> <span class="token number">7</span> <span class="token keyword">select</span> x <span class="token operator">*</span> y |
Some basic concepts
Predicates
Predicate is used to describe the logical relationships that make up an QL program. Rather, a predicate evaluates a dataset. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | predicate isCountry <span class="token punctuation">(</span> string country <span class="token punctuation">)</span> { country <span class="token operator">=</span> <span class="token string">"Germany"</span> <span class="token operator">or</span> country <span class="token operator">=</span> <span class="token string">"Belgium"</span> <span class="token operator">or</span> country <span class="token operator">=</span> <span class="token string">"France"</span> } predicate hasCapital <span class="token punctuation">(</span> string country <span class="token punctuation">,</span> string capital <span class="token punctuation">)</span> { country <span class="token operator">=</span> <span class="token string">"Belgium"</span> <span class="token operator">and</span> capital <span class="token operator">=</span> <span class="token string">"Brussels"</span> <span class="token operator">or</span> country <span class="token operator">=</span> <span class="token string">"Germany"</span> <span class="token operator">and</span> capital <span class="token operator">=</span> <span class="token string">"Berlin"</span> <span class="token operator">or</span> country <span class="token operator">=</span> <span class="token string">"France"</span> <span class="token operator">and</span> capital <span class="token operator">=</span> <span class="token string">"Paris"</span> } |
Predicate isCountry
has 1 tuple {("Belgium"),("Germany"),("France")}
, hasCapital
has 2 tuple {("Belgium","Brussels"),("Germany","Berlin"),("France","Paris")}
Define a predicate
When defining a predicate, you need to specify the:
- The
predicate
keyword (if no return data is available), or the type of return data. - Name of predicate. Identifiers begin with lowercase letters.
- The predicate parameters, if any, are separated by commas. For each input parameter need to specify the data type.
- Content of predicate.
Predicate has no returned data
1 2 3 4 | predicate isSmall <span class="token punctuation">(</span> <span class="token keyword">int</span> i <span class="token punctuation">)</span> { i <span class="token operator">in</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">.</span> <span class="token punctuation">.</span> <span class="token number">9</span> <span class="token punctuation">]</span> } |
Predicate has data returned
1 2 3 4 5 | <span class="token keyword">int</span> getSuccessor <span class="token punctuation">(</span> <span class="token keyword">int</span> i <span class="token punctuation">)</span> { result <span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span> <span class="token operator">and</span> i <span class="token operator">in</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">.</span> <span class="token punctuation">.</span> <span class="token number">9</span> <span class="token punctuation">]</span> } |
Source
In the process of analyzing the data stream, the source
is understood as the beginning of the data stream.
Sink
sink
is considered the end of the data flow.
Flow
Data flow models the way data flows through the program at run time. Meanwhile abstrct syntax tree reflects the structure of the program.
Environment settings
There are two ways to practice CodeQL code queries: use the lgtm console platform or run locally
Query on lgtm console
Before writing CodeQL code we choose the language and project
Finally, write the query and hit run to execute the query.
Query on local
To query locally, we need to install the necessary tools.
Install the tool
First download the codeql-cli file and extract it. Next install codeql extension for vscode.
After installing the codeql extension for vscode, to be able to execute codeql commands we need to install codeQL-cli. Install codeql-cli by adding codeql executable file path in User settings, with linux using codeql file, windows uses codeql.exe file.
Finally, add the QL library to the vscode workspace so we can start writing the query.
Write the query
After you have installed all the things you need to the last step is to write the query. In order to write a query, we need a database (like SQL, if we want a query to have results, we need a database for the query to show the results for us).
Create database
When creating a database to query, codeql will analyze the source code and make a snapshot of the source code. To create the database, use the following command.
1 2 | codeql database create databases/<database-name> -s projects/<source-code> -l javascript |
codeql
: this is the executable file located in codeql-cli downloaded above.databases/<database-name>
: path to the database location-s
: path to the source code to create the database-l
: language to create database
Write the query
When writing a query, we need to put the query where appropriate. In case of writing query for javascript source code, we need to put the query code file into the path: ql/javascript/ql/src
To give a better picture, we use a simple example to reduce XSS error finding on javascript source code.
1 2 3 | <span class="token keyword">var</span> param <span class="token operator">=</span> location <span class="token punctuation">.</span> hash <span class="token punctuation">.</span> <span class="token function">split</span> <span class="token punctuation">(</span> <span class="token string">"#"</span> <span class="token punctuation">)</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token punctuation">;</span> document <span class="token punctuation">.</span> <span class="token function">write</span> <span class="token punctuation">(</span> <span class="token string">"Hello "</span> <span class="token operator">+</span> param <span class="token operator">+</span> <span class="token string">"!"</span> <span class="token punctuation">)</span> <span class="token punctuation">;</span> |
Query to find document.write
1 2 3 4 5 6 7 | <span class="token keyword">import</span> javascript <span class="token keyword">from</span> Expr dollarArg <span class="token punctuation">,</span> CallExpr dollarCall <span class="token keyword">where</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"write"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"document"</span> <span class="token operator">and</span> dollarArg <span class="token operator">=</span> dollarCall <span class="token punctuation">.</span> getArgument <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token keyword">select</span> dollarArg |
Running the query results as follows
Query location.hash.split
1 2 3 4 5 6 | <span class="token keyword">import</span> javascript <span class="token keyword">from</span> CallExpr dollarCall <span class="token keyword">where</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"split"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"location.hash"</span> <span class="token keyword">select</span> dollarCall |
Data flow analysis
After finding the source
and sink
of the xss error. We combine them to find pieces of code that flow from source
to sink
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | class XSSTracker extends TaintTracking::Configuration { XSSTracker <span class="token punctuation">(</span> <span class="token punctuation">)</span> { <span class="token comment">// unique identifier for this configuration</span> this <span class="token operator">=</span> <span class="token string">"XSSTracker"</span> } override predicate isSource <span class="token punctuation">(</span> DataFlow::Node nd <span class="token punctuation">)</span> { <span class="token keyword">exists</span> <span class="token punctuation">(</span> CallExpr dollarCall <span class="token operator">|</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> instanceof CallExpr <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"split"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"location.hash"</span> <span class="token operator">and</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> dollarCall <span class="token punctuation">)</span> } override predicate isSink <span class="token punctuation">(</span> DataFlow::Node nd <span class="token punctuation">)</span> { <span class="token keyword">exists</span> <span class="token punctuation">(</span> CallExpr dollarCall <span class="token operator">|</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"write"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"document"</span> <span class="token operator">and</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> dollarCall <span class="token punctuation">.</span> getArgument <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> } } <span class="token keyword">from</span> XSSTracker pt <span class="token punctuation">,</span> DataFlow::Node source <span class="token punctuation">,</span> DataFlow::Node sink <span class="token keyword">where</span> pt <span class="token punctuation">.</span> hasFlow <span class="token punctuation">(</span> source <span class="token punctuation">,</span> sink <span class="token punctuation">)</span> <span class="token keyword">select</span> source <span class="token punctuation">,</span> sink |
Bonus: Data flow is visible to the eye
To use this feature will need to replace some functions used. But the idea of finding fault is still the same, you still have to find the source
and sink
. After the code has finished running, we can visually search where our data goes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <span class="token comment">/** * @name XSS * @kind path-problem * @id js/test */</span> <span class="token keyword">import</span> javascript <span class="token keyword">import</span> DataFlow::PathGraph class XSSTracker extends TaintTracking::Configuration { XSSTracker <span class="token punctuation">(</span> <span class="token punctuation">)</span> { <span class="token comment">// unique identifier for this configuration</span> this <span class="token operator">=</span> <span class="token string">"XSSTracker"</span> } override predicate isSource <span class="token punctuation">(</span> DataFlow::Node nd <span class="token punctuation">)</span> { <span class="token keyword">exists</span> <span class="token punctuation">(</span> CallExpr dollarCall <span class="token operator">|</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> instanceof CallExpr <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"split"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"location.hash"</span> <span class="token operator">and</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> dollarCall <span class="token punctuation">)</span> } override predicate isSink <span class="token punctuation">(</span> DataFlow::Node nd <span class="token punctuation">)</span> { <span class="token keyword">exists</span> <span class="token punctuation">(</span> CallExpr dollarCall <span class="token operator">|</span> dollarCall <span class="token punctuation">.</span> getCalleeName <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"write"</span> <span class="token operator">and</span> dollarCall <span class="token punctuation">.</span> getReceiver <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">.</span> toString <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token string">"document"</span> <span class="token operator">and</span> nd <span class="token punctuation">.</span> asExpr <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token operator">=</span> dollarCall <span class="token punctuation">.</span> getArgument <span class="token punctuation">(</span> <span class="token number">0</span> <span class="token punctuation">)</span> <span class="token punctuation">)</span> } } <span class="token keyword">from</span> XSSTracker pt <span class="token punctuation">,</span> DataFlow::PathNode source <span class="token punctuation">,</span> DataFlow::PathNode sink <span class="token keyword">where</span> pt <span class="token punctuation">.</span> hasFlowPath <span class="token punctuation">(</span> source <span class="token punctuation">,</span> sink <span class="token punctuation">)</span> <span class="token keyword">select</span> sink <span class="token punctuation">.</span> getNode <span class="token punctuation">(</span> <span class="token punctuation">)</span> <span class="token punctuation">,</span> source <span class="token punctuation">,</span> sink <span class="token punctuation">,</span> <span class="token string">"xss"</span> |