How did X reveal personal information of 3 million users?


While discussing the publication of the article in detail, I accepted the statement with Party X that I would not name any company or organization in this article. Anyway, the real experience of analyzing and searching for the perpetrators … well, the new vulnerability search is the most valuable thing I want to share with you. So what a name is not so important, right ?!

About T-Rekt

In this article, in addition to Juno and K-20, a new member has never appeared in previous posts on Juno_okyo's Blog . That's T-Rekt, the youngest member in J2TeaM. Young people like to target large companies and corporations. The report of the personal information leak of 3 million users is just his debut, everyone should wait for certain X, Y, Z companies to be named in the articles. Next analysis of Juno_okyo!

How did T-Rekt find the hole?

Previously, T-Rekt also knew how to read web page source code and often captured queries using Fiddler tool. Then one day, he came up with the idea of ​​debugging an Android application and should read the document and change the phone's proxy to be able to capture through Fiddler.

And the first Android application that T-Rekt chose to debug is the Android application of X. From the beginning, T-Rekt tried logging in and ordering ostrich types to capture more queries. good. With what Fiddler "grabbed", T-Rekt realized something was wrong in the user account information. He also found some small bugs like changing any user's avatar with only UserID.

The next day, he sent a message to the team thanks to the vulnerability analysis he found. He said "discovered a hole in X's API that allows to retrieve account information of all users (except passwords)". As soon as I read the message, K-20 had already pm me and that night the two of us analyzed the X system based on the information provided by T-Rekt. The result is that the vulnerability can access the personal information of more than 3 MILLION ACCOUNT . This information includes: full name, gender, date / month / year of birth, email, phone number, ID number, personal address, …

Opportunity to participate in Vietnam's largest programming event: Opportunity to participate in the biggest programming event in Vietnam:

Reverse Android X's app

I quickly Google and found X's app on Google Play. Instead of analyzing the application via HTTP queries like T-Rekt did, I would decompile the application to Java code for direct analysis from the original source code. Therefore, there is no need to install the application on the phone, but I will download the APK file of this application to my computer. This is quite simple, only with the keyword "apk downloader" on Google is finished.

APK is an extension from JAR and ZIP, so using only normal extracting software like WinRAR or WinZIP can see the hidden resources. Directory tree structure:

  • assets
  • com
  • fabric
  • lib
  • res
  • AndroidManifest.xml
  • classes.dex (most important file)
  • resources.arsc

I use the dex2jar tool to convert classes.dex into classes.jar and then reverse engineer and see the source structure in jd-gui .

Analyze the Android application of X

So we have the Java source code of the application, but there are too many classes in a package, it is time consuming to sit around. I use jd-gui's search function based on an Endpoint that T-Rekt has provided:

… There is no result. Often an API system not only has an endpoint to query, so the programmer can usually assign Base URL to a variable or constant. Eg:

const BASE_URL = '';

Then concatenate the string to create endpoints. Thinking about it, I tried to search again with the "customer / profile / id" substring:


There are instant results! He he. Save these two classes separately and reopen with jd-gui for further analysis:


There are too many endpoints, as expected, the Base URL has been saved in a separate place, so this class contains only the substring behind.

The HTTP queries of this application are all processed in class I target the most sensitive password, so immediately press Ctrl + F and type the keyword "password".

The first method that appears is changePassword . Applying the general pattern so it requires a parameter of current_password . Although this method cannot be exploited to change any user's password immediately, a hacker can force brute the current_password parameter to target a user with a specified email address.

The next method is forgetPassword used to recover the password, the only parameter it requires is email. This method can be exploited in case the hacker has captured the victim's email, thereby taking account through the password recovery feature.

Some other methods are mainly querying user information such as transaction history, accumulated points, … And finally, the most interesting method is login , which can be used to perform brute force.

Method register can also be used to spam new accounts but this form of spam is not very interesting for hackers because it is only destructive, with script kiddies, it is probably another story.

Find all the methods related to the password, I go ahead and immediately see the most important method of the article:

The puzzling thing is that if most other query methods have parameters usersessionid (the current user's session ID) to determine who is querying, then the method that takes the file only loves The only parameter bridge is the user ID. It means that the method can get the most information about the user is the worst security. Just changing the ID from the URL is able to view the information of any user.

Assuming the programmer does not want to use the session ID, it can also use the access token to authenticate. In the login method, the access token is returned but it looks like the only place it is used then is to log the user out of the application:


A good example is the Facebook: <ID> API system (you can only query when passing the access_token parameter)


Write code to exploit vulnerabilities automatically

As analyzed, we can now query any user ID-based information and perform brute force accounts based on email addresses. I proceed to write the exploit code for them in Python.

Exhausted X user information

Just create a query function to[USER_ID] and then use the loop to scan from Min to Max. The problem now is to find the value of Min and Max. If a hacker has free time, he can scan from number 1 to a very large number of IDs. But like that, it is a waste of time for queries into certain IDs that do not exist.

The way to find is not difficult, I do the same search when finding the column number to exploit SQL Injection error. First I tried with ID 10000 (because X is a large system, so I guess the number of users is high the first time). The result returned error code 400 with the message "Id param is not provided". What is it? Obviously, the URL already has an ID.

Looking back at the source code of the Android application, in the callGetApi method there is a paragraph:


Well, each query is added to an "X-Device" header. In the message sent to me, T-Rekt mentioned this case, but when I code PoC I forgot it. Immediately adding that header, at this time I also discovered that the value of this header is not necessarily Android or iOS as T-Rekt said, but can set any value, even one empty string.

You can understand that at server X will only check to see if there is a header like this or not, regardless of its value, like: if (isset($header)) {...}

I try with the numbers ID: 10000, 100000, 1000000, … (increase by every number 0) will return user information. Up to 10 million, you receive a 404 error code and the message "The customer does not exist" (ie this ID does not exist). Inferred values ​​range from 1 million < Max <10 million.


I took a value in the middle of 5 million, still received a 404 error. In my experience, I was careful to replace the number 0 at the end of 1, avoiding the even number of 404 but odd numbers existed. Repeating until Max is released is a large number of over 300,000 users .

The process of finding Min is similar but applies to a small space and is 1779. Use a loop to scan the whole:



Brute force user account X

This is a form of attack by creating random passwords and then logging in, repeating until there is a successful login password. With the information analyzed, we already know the endpoint to log in:

Parameter to login is email address and password. Email we get from the exhausted information above, just pick out any email. Or if you have time and a virtual server (VPS) to hang, you can combine both forms of attack at the same time: Use the loop with User ID to scan out email> call brute force function with the parameter That email address. So hackers can get full personal and password information (successful detection) of more than 3 million accounts.

PoC (Proof of Concept)



September 19, 2:12 PM The vulnerability is reported to X.
25/09/2016 9:07 PM Since I didn't receive any feedback, I decided to create a website using X's own API to help people check if they were among the 3 million people who could leak personal data.
09/26/2016 11:18 AM GenK posted a post about the leak of personal data, followed by a series of other news articles.
09/26/2016 11:03 PM I sent a second report.
27/09/2016 10:31 AM The X side responded and asked for contact information to discuss directly with me.
September 27, 5:25 PM I provide contact information as suggested from X.
27/09/2016 10:09 PM The representative of X contacted me. I share the whole process of team discovery and vulnerability analysis. The representative of X expressed his wish to give us a reward (of course we did not refuse).
09/28/2016 11:10 PM I report a small bug on X's website. At this point, X's API vulnerability has been fixed.

ITZone via Juno_okyo

Share the news now