Article by Felipe Daragon and Syhunt Icy Team. February 12, 2021
Thanks to the over 8.000 companies that contacted us after the initial article and requested more information about how they were exposed by the leak. While we wait for the authorities' next moves, we continue to monitor the news and updates regarding the leak. Below you can find our key findings and analysis of the leak.
Through our expert analysis and participation in a series of articles by the media, we helped highlight the dimension of the mega leak that exposed data from almost all Brazilians in January, 2021.
We concluded that between 673GB and 873GB (nearly 1TB) of data about Brazilian companies and individuals was stolen in 2020 and compiled into a single archive, likely from multiple leaks that occured over time. As a result, key details from a staggering total of 223 million Brazilians individuals and 40 million Brazilian companies were exposed and are being actively sold by cybercriminals on Internet forums and the Dark Web.
Following a request by the Estadão newspaper, we analyzed the case together with the newspaper. We revealed a number breakdown of the leak, among other relevant details with the intention of informing the public and the businesses and raising awareness about the troubling scale of the leak.
Days later, the publication of a second article by the Estadão prompted the Brazilian Supreme Federal Court to order an investigation and the blocking of access to the cybercriminal's posts and links. Since then, an investigation by the Brazilian authorities is underway.
Additional analyses by Syhunt in partnership with the newspaper revealed that 1) the face pictures in the cybercriminal's archive were actually copied from DivulgaCand and also that 2) half a million corporate mobile numbers were exposed.
The leak in numbers
Following Estadão 's request, we processed the catalog and small samples published by the cybercriminal, simulated individual CSV exports and performed a variety of math calculations to confirm the cybercriminal's claims, uncover a "picture" and very closely estimate the full size of the leak and the databases in the hands of the cybercriminal:
|Total of Brazilian Individuals Exposed||Total of Brazilian Companies Exposed||Total of Vehicles Exposed|
|37||17||1 (48 col.)|
|Information Categories||Information Categories||Information Categories|
|Est. Uncompressed People Database Size||Est. Uncompressed Business Database Size||Uncompressed Database Size|
|Est. Data Size Per Person (Without Face Pic)||Est. Data Size Per Company||Aprox. Data Size Per Vehicle|
|Total of People in Leaked Samples||Total of Companies in Leaked Samples||Total of Vehicles in Leaked Samples|
|1.1M (16GB est.)||N/A||N/A|
|Total of Face Pictures in Database||Total of Face Pictures in Database||Total of Face Pictures in Database|
The leak in numbers: Phone numbers
|Total of Brazilians with Phone Details Exposed (Mobile/Landline): 159845321||Total of Companies with Phone Details Exposed (Mobile/Landline): 28695845|
|Total of Mobile Phone Numbers in Leaked Samples||Total of Corporate Mobile Phone Numbers in Leaked Samples|
Total of Leaked Corporate Mobile Phone Numbers in Samples - By State
|Rio Grande do Sul||17.802|
|Rio de Janeiro||14.721|
The source of the leak
We named this leak BLB20 (Big Leak of Brasil 2020), because the cybercriminal's data is up-to-date till 2020. Much has been speculated about the source of the leak and we will likely learn more about it as the investigation by the Brazilian authorities, information security companies and the media organizations progresses.
Part of this data breach may have been an inside job - carried out deliberately and maliciously by a firm employee, an opinion shared by many security researchers. We believe that cybercriminals or some analytics company compiled various leaks that happened over the years into the single archive. We concluded and later Estadão confirmed that face pictures in the database were copied from TSE's DivulgaCand, which appears to confirm the compilation of data from multiple leaks and sources.
The cybercriminal referred to his archive as the Serasa Experian database. Serasa Experian is a major Brazilian credit research firm, but the company stated that carried out an internal investigation and the data in the leaked archive doesn't matches the data found in the company's database.
Another mega leak? On January 10, reports of a second mega leak emerged, but, though it comes from credible source that alerted about the first leak, due to the lack of references, we've not been able to confirm the new leak - this analysis and article is about the first leak only.
The Leaked Information Categories
The following are the categories of information revealed in the leak and the estimated size of each individual database:
Business / Legal Entity Data
|Data Set Name||Description||Estimated Size|
|01 - Basic||CNPJ, corporate name, trade name, registration (head office / branch, situation), date of foundation, number of employees, size, legal nature||8.3GB|
|02 - Email||2.9GB|
|03 - Telephone||Area code, number, operator, plan, line type (fixed, prepaid, postpaid), installation date||48.2GB|
|04 - Address||Street address, number, neighborhood, city, state, zip code, type (Residential / Commercial), latitude and longitude||8.5GB|
|05 - Mosaic||Targeting group and subgroup||1.7GB|
|06 - Business||Name and CPF of the company’s partners, participation (shares and %), date of entry into the company||45.9GB|
|07 - IRS||Foundation date, registration status (Active / Downloaded / Inept)||5.5GB|
|08 - Credit Score||Risk score, risk level (Low / Medium / High)||2.2GB|
|09 - Legal Representative||CPF and name of representative, registration status (Active / Downloaded / Unfit)||2.0GB|
|10 - Checks without Funds||Bank code and branch, reason (No funds / Account closed)||0.1GB|
|11 - Operating Class||Hours of operation (24h, commercial 9 am to 6 pm, lunch, night etc.), type of distribution (physical retail, online retail, physical wholesale)||0.2GB|
|12 - National Simple and SIMEI||Situation (Opt / Non-opt)||4.3GB|
|13 - Legal Nature||Corporation, individual entrepreneur, cooperative, public agency, etc.||2.6GB|
|14 - Share Capital Value||1.7GB|
|15 - Debtors||Type (principal, co-responsible), responsible unit, registration, type of credit (fine, IRPJ, COFINS, CSLL etc.), amount||9.5 - 20 GB|
|16 - Sintegra||State registration number, activity start date, registration status||1.4GB|
|17 - CNAE||3.8GB|
|All Data Sets - Aprox. Total Size||150 - 200GB|
- 01 - Basic: person's name, CPF, gender, date of birth, father’s name, mother’s name, marital status (married, single, divorced, widowed, others)
- 02 - Email
- 03 - Telephone: Area code, number, operator, plan, line type (fixed, prepaid, postpaid), installation date
- 04 - Address: street address, number, neighborhood, city, state, zip code, type (residential / commercial), latitude and longitude households: CPF of householder, number of persons, income bracket, full address schooling: level (illiterate / elementary / technical / higher etc.)
- 05 - Mosaic: targeting group and subgroup
- 06 - Occupation: position, number CBO (Brazilian Classification of Occupations)
- 07 - Credit Score: credit activity, risk score, risk level (Low / Medium / High)
- 08 - RG (Identity Card)
- 09 - Voter Title: registration number, zone, section, address, county, state
- 10 - Education
- 11 - Business: name of the partner of a company, participation (shares and%), corporate name and trade name of the company, CNPJ, date of entry into the company
- 12 - IRS: cadastral situation (Regular / Suspended / Canceled / Deceased Holder)
- 13 - Social Class: A1, A2, B1, B2, C1, C2, D, E
- 14 - Marital Status: married, single, divorced, widowed, others
- 15 - Job: CNPJ and corporate name of the employer, PIS / PASEP / NIT number, CTPS number, type of employment (CLT, self-employed, server, apprentice etc.), date of admission, salary, hours of work per week
- 16 - Affinity: accuracy level, percentile
- 17 - Analytical Model: predicts chance of consumer having affinity to buy a product or service
- 18 - Purchasing Power: level (low, medium, high), income, salary
- 19 - Photos of Faces: 1,176,157 JPEG images with dates between 2012 and 2020; the file name is the CPF of the corresponding person
- 20 - Public Servants: job description, capacity, exercise, gross income, status, bond, removal (Yes / No)
- 21 - Checks without Funds: bank code and branch, reason (No funds / Account closed)
- 22 - Debtors: name, type of debtor (principal, co-responsible), situation (active, in collection, filed), type of debt (fine, income tax, PIS etc.), amount, did it end up in court? (Yes / No)
- 23 - Family Grant: amount, status of benefit (Released / Blocked), status of benefit (Active / Inactive), number and name of dependents, NIS (Social Identification Number)
- 24 - University / College Sudents: 1,643,105 people with college name, course, year of entry and year of completion
- 25 - Advicers: 2,260,960 people who provide consultancy in the public or private sphere, including situation, specialty and occupation code
- 26 - Households: all the people who shares the same address
- 27 - Family Bond: categorizes people according to a first degree (mother, father, son, daughter, brother, sister, spouse) or second degree (grandfather, grandson, uncle, nephew, cousin, etc.)
- 28 - LinkedIn: 5,051,553 social network profiles with ID number and access URL
- 29 - Salary: value, type (monthly, biweekly, weekly, etc.), hours per week
- 30 - Income: monthly amount (includes salary, rent, interest, etc.), social class (low, medium, high), income range
- 31 - Deceased: date of death, age, date of death certificate, name and address of the registry office.
- 32 - IRPF (Income Tax): bank institution name, branch code, refund lot
- 33 - INSS: insured’s name, benefit number, start date, type (retirement, pension, maternity salary, etc.)
- 34 - FGTS: PIS number
- 35 - CNS (National Health Card)
- 36 - NIS (Social Identification Number)
- 37 - PIS / PASEP
All Data Sets - Aprox. Total Size: 500 - 650GB
- ID: internal database number
- Kind of Person: physical or legal
- Update Date: varies from 1993 to 2020)
- Board: in old or new format
- Municipality and UF of the board
- Vehicle Situation
- Restrictions: without restriction, restricted by theft, pledge, fiduciary alienation, etc.
- Chassis Number
- Chassis Situation: Normal, Restricted
- Engine Number
- Gearbox Number (if applicable)
- Body Number (if applicable)
- Body Type: open, closed, jeep, van, double cab, motorcycle etc.)
- Invoiced Document Type
- Billed UF
- Billed: contains sequence of numbers related to the invoiced document, such as invoice
- Brand and Model: there are 37 thousand different models
- Model Year
- Year of Manufacture
- Vehicle Color
- Vehicle Type: bicycle, moped, scooter, motorcycle, automobile, bus, truck, etc.
- Kind of Vehicle: passenger, cargo, mixed, traction, collection etc.
- Fuel: gasoline, alcohol, diesel, natural gas, electric, etc.
- Power: power in HP
- Maximum Traction Capacity
- Total Gross Weight
- Battery Capacity
- Number of Passengers
- Number of Axes
- Nationality: domestic or imported
- DI: import declaration
- Importer’s Identity
- Type of document of the importer
How we got to the numbers
Through our collaboration with the media, which included Estadão, Folha de São Paulo and Tecnoblog, to produce the above analyses and estimates, as long-time information security researchers and professionals, we acted responsibly - during this process, we didn't seek to contact the cybercriminal or seek to purchase data sets from the hacker, and we did not obtain a copy of his archive, that we above estimated the full size. In addition to this, we didn't seek to financially profit from the leak in any way.
- Est. Data Size Per Person (Without Face Pic) and Est. Data Size Per Company: based on the samples and data set catalog provided by the cybercriminal, we simulated a CSV export of data of single individuals and companies. We concluded, for example, that leaked business data about Syhunt itself was around 7.33 KB of text data. After examining the size of multiple simulated exports, we estimated the data size per person and per company.
- Aprox. Data Size Per Vehicle - the usual size of each line of the leaked vehicles archive.
- Total of Face Pictures in Database - 20GB est: we divided the size in bytes of the sample photo archive (17.3 MB) by 1.334 JPEG files. Then we multiplied by the number of available face pictures in the full archive (1.1M, or to be more exact 1,176.157).
- Est. Uncompressed People Database Size: we multiplicated the estimated data size per person in bytes with the total of Brazilian individuals exposed. We also added the estimated uncompressed size of face pictures in the database.
- Est. Uncompressed Business Database Size: we multiplicated the estimated data size per company in bytes with the total of Brazilian legal entities exposed. We also processed the cybercriminal catalog information with software per data set column and generated the estimates available below.
- Est. Uncompressed Database Size (All Databases) - Nearly 1 TB: the sum of the People, Business and Vehicles estimated database sizes.
This is the biggest and most serious data leak that Brazil has ever experienced. Syhunt recommends real, immediate and continuous efforts, by the government and private sector, to vigorously respond to this leak, which must include, among other things:
- Accelerate response to this leak and future leaks.
- Suppress the selling of the leaked information.
- Prevent the leaked data from being actively exploited by criminals.
- Create new mechanisms to detect, monitor and report leaks.
- International cooperation with other law enforcement agencies.
- Discuss, and put in place, concrete countermeasures with the help of key information security companies and professionals.
About Syhunt Security
With next-generation assessment technology, Syhunt established itself as a leading player in the web application security field, delivering its assessment tools to a range of organizations across the globe, from the SMB to the enterprise. Syhunt products help organizations defend against the wide range of sophisticated cyberattacks currently taking place at the Web application layer.
Syhunt proactively detects vulnerabilities and weaknesses that lead to data leak or breach - Syhunt tools focus on the many angles and views that can be used for evaluating the security state of a web application, such as its live version (through dynamic analysis / DAST), source code (SAST), server log (proactive forensics) and configuration (hardening).
Syhunt's founder Felipe Daragon started his career working as a security consultant for government organizations and corporations in the 90s. In the beginning of his career he worked for leading information security firms in Brazil. Daragon's last 22 years in the information security industry were dedicated to proactively defend companies and government agencies from attacks, and raising awareness about pressing security issues and new cyber attack trends.
References & Thanks
- Thanks to Paulo R. Santos (Jump2) and Mario C. Fialho for participating the analyses together with Syhunt and the newspapers.
- Thanks to Felipe Ventura for the first detailed analyses about the leak, which were posted by Tecnoblog as part of two articles and started to highlighted the dimension of the leak. Thanks to Renato Kopke for sending me the links to the articles.
- Thanks to Roberto F. Marc (Syhunt) for reviewing the math calculations.
- Megavazamento de dados de janeiro expôs mais de 500 mil celulares corporativos, Gizmodo. February 11, 2021
- Megavazamento de janeiro fez meio milhão de celulares corporativos circularem na internet, Estadão, February 10, 2021
- Fotos de megavazamento são de políticos que se candidataram entre 2012 e 2020, Canaltech, February 5, 2021
- Fotos em megavazamento de dados são de candidatos nas eleições entre 2012 e 2020, Estadão, February 4, 2021
- PF investiga venda de dados de Bolsonaro e de ministros do STF, CNN, February 3, 2021
- Após megavazamento, dados de ministros do Supremo são postos à venda Conjur. February 2, 2021
- Dados vazados podem render R$ 80,8 milhões ao criminoso Folha de São Paulo. February 2, 2021
- Dados de Bolsonaro e ministros do STF estão à venda na internet após megavazamento Estadão, February 1, 2021
- Após vazamento, dados de 40 mil pessoas já circulam na internet. CNN (Via Estadão), January 29, 2021
- Após megavazamento, dados de 40 mil brasileiros já circulam na internet, Estadão, January 28, 2021
- O que há no vazamento que afetou 40 milhões de CNPJs, Tecnoblog, January 22, 2021
- Vazamento que expôs 220 milhões de brasileiros é pior do que se pensava, Tecnoblog, January 22, 2021
References Translated (In English)
- Details of the leak on 100 million vehicles in Brazil, January 25, 2021
- Leak that exposed 220 million Brazilians is worse than previously thought, January 22, 2021
- What's in the leak that affected 40 million CNPJs, January 22, 2021