Learn CodeQL

Tram Ho

CodeQL is the source code analysis platform used by security researchers to automatically analyze errors. CodeQL can be done via an online platform on the LGTM.com query console .

CodeQL is based on a powerful query language called QL. Understanding QL helps us have a better view of reading comprehension as well as writing analytical code with CodeQL.

CodeQL is currently supporting the languages: C / C ++, C #, Go, Java, Python, Javascript, COBOL

About QL

QL is a powerful query language that underpins CodeQL. Queries written by CodeQL can find errors and detect different types of security vulnerabilities. To read examples of newly discovered security holes in the open souce project, go to the GitHub Security Lab .

QL is a logical query language, so it is built from logical structures. QL uses common logical connections (such as and , or , not ), quantification (like forall , exists ), and other important logical concepts such as predicate .

QL also supports recursion and aggregation. This allows us to write complex recursive queries using simple QL syntax and use set functions like count , sum , average directly.

To better understand QL go to About QL , QL language handbook

Basic syntax

The basic syntax of QL looks like SQL, but it is used a little differently.

A query is defined by the select clause, which indicates the desired result at the output.

A simple query

The query simply produces a “Hello world” string.

The query is more complicated

For example, the query result is 42

Some basic concepts

Predicates

Predicate is used to describe the logical relationships that make up an QL program. Rather, a predicate evaluates a dataset. For example:

Predicate isCountry has 1 tuple {("Belgium"),("Germany"),("France")} , hasCapital has 2 tuple {("Belgium","Brussels"),("Germany","Berlin"),("France","Paris")}

Define a predicate

When defining a predicate, you need to specify the:

  1. The predicate keyword (if no return data is available), or the type of return data.
  2. Name of predicate. Identifiers begin with lowercase letters.
  3. The predicate parameters, if any, are separated by commas. For each input parameter need to specify the data type.
  4. Content of predicate.

Predicate has no returned data

Predicate has data returned

Source

In the process of analyzing the data stream, the source is understood as the beginning of the data stream.

Sink

sink is considered the end of the data flow.

Flow

Data flow models the way data flows through the program at run time. Meanwhile abstrct syntax tree reflects the structure of the program.

Environment settings

There are two ways to practice CodeQL code queries: use the lgtm console platform or run locally

Query on lgtm console

Before writing CodeQL code we choose the language and project

Finally, write the query and hit run to execute the query.

Query on local

To query locally, we need to install the necessary tools.

  1. Codeql-cli
  2. VsCode
  3. Extension Codeql Vscode
  4. QL library

Install the tool

First download the codeql-cli file and extract it. Next install codeql extension for vscode.

After installing the codeql extension for vscode, to be able to execute codeql commands we need to install codeQL-cli. Install codeql-cli by adding codeql executable file path in User settings, with linux using codeql file, windows uses codeql.exe file.

Finally, add the QL library to the vscode workspace so we can start writing the query.

Write the query

After you have installed all the things you need to the last step is to write the query. In order to write a query, we need a database (like SQL, if we want a query to have results, we need a database for the query to show the results for us).

Create database

When creating a database to query, codeql will analyze the source code and make a snapshot of the source code. To create the database, use the following command.

  • codeql : this is the executable file located in codeql-cli downloaded above.
  • databases/<database-name> : path to the database location
  • -s : path to the source code to create the database
  • -l : language to create database

Write the query

When writing a query, we need to put the query where appropriate. In case of writing query for javascript source code, we need to put the query code file into the path: ql/javascript/ql/src

To give a better picture, we use a simple example to reduce XSS error finding on javascript source code.

Query to find document.write

Running the query results as follows

Query location.hash.split

Data flow analysis

After finding the source and sink of the xss error. We combine them to find pieces of code that flow from source to sink .

Bonus: Data flow is visible to the eye

To use this feature will need to replace some functions used. But the idea of ​​finding fault is still the same, you still have to find the source and sink . After the code has finished running, we can visually search where our data goes.

Share the news now

Source : Viblo