Securing the Data Jungle: Lessons from the DeepSeek ClickHouse Exposure

This week gave me a unique opportunity to break down DeepSeek R1 from several dimensions. As the week progressed, more issues came to light—including the unsecured ClickHouse database discovered by Wiz Security (Wiz Research DeepSeek Database Leak). At least it was the good guys who found it first!

Let’s break it all down: the problem, the risks, and most importantly, the solutions that any company can apply to avoid similar exposure. The great news is that there are tried-and-trusted players who offer exceptional products to secure your infrastructure.


The Incident Recap: Data, Data Everywhere

During a routine vulnerability scan, the Wiz team identified unsecured ports (8123 and 9000) that allowed them to query DeepSeek’s ClickHouse database. These ports were publicly accessible, meaning the database was not protected by a firewall or WAF (Web Application Firewall). Worse, there was no authentication configured. When Wiz dug deeper, they discovered that the database contained a massive variety of sensitive data, including:

  • Plaintext chat histories between users and DeepSeek’s AI systems.
  • API keys and cryptographic secrets.
  • Server directory structures and operational metadata.
  • References to internal API endpoints.

The researchers were not immediately locked out, suggesting the absence of real-time monitoring.

Additionally, in follow-up reports, it was revealed that Wiz struggled to contact DeepSeek due to the lack of a security contact or publicly accessible email address. They resorted to sending LinkedIn messages to notify the company.


What Is ClickHouse Anyway?

ClickHouse is an open-source columnar database developed by Yandex. It’s optimized for high-performance analytical queries on large datasets, making it popular for real-time data processing. ClickHouse operates in the same space as Snowflake, Apache Druid, and Google BigQuery.

Here’s the key takeaway: no matter how advanced a database is, it’s still just that—a database. It requires security controls like any other system. The lesson here is not about which database technology you use but rather about securing your systems properly.


Personal, Operational, and Security Data Together: A Governance Failure

Data governance involves identifying and classifying data to determine its risk level, regulatory requirements, and access restrictions. In this case, personal data (including PII), operational data, and security data were all stored in the same environment in plaintext.

Well-designed systems often separate these classes into different instances. For example, security logs might reside in a dedicated instance with heightened encryption and stricter Role-Based Access Control (RBAC). Proper segmentation would also ensure each instance is protected within its own VPC (Virtual Private Cloud).

Security by design means asking these questions early and integrating solutions into the development process. When done right, deployment becomes smoother and surprises are minimized.


The Core Vulnerabilities: Misconfiguration, Monitoring, and Data Separation

This breach can be broken down into three key vulnerabilities:

  1. Misconfiguration: The ClickHouse database had a public IP, open ports (8123 and 9000), and no authentication.
  2. Data Separation: There was no segmentation between personal, operational, and security-sensitive data. All types of information were stored in the same database without proper access controls or encryption.
  3. Monitoring: There was no real-time monitoring to detect or prevent unauthorized access. Additionally, there were no scans to identify open ports from within or outside the environment.

Misconfigurations like these are surprisingly common. How often have you deployed a database on a cloud provider and carefully reviewed every security setting when you were in a hurry? Ever heard of someone accidentally pushing their .env file to GitHub? The list of security mishaps goes on…


Best Practices for Securing Analytical Databases

Avoiding breaches like this doesn’t require reinventing the wheel. Follow these standard best practices:

1. Network Segmentation

  • Use a VPC to isolate your database from public networks.
  • Assign the database a private IP address and restrict access through VPNs or secure gateways.

2. Access Controls

  • Implement Role-Based Access Control (RBAC) to ensure only authorized users can access sensitive data.
  • Use multi-factor authentication (MFA) for all admin accounts.
  • Enforce password policies with regular rotation for all accounts.

3. Firewall and WAF Protection

  • Deploy a Web Application Firewall (WAF) to block unauthorized traffic.
  • Restrict access to database ports through firewall rules.

4. Encryption

  • Encrypt data at rest to ensure sensitive information is protected even if the database is compromised.
  • Enable SSL/TLS to encrypt data in transit.

5. Vulnerability Scanning

  • Use a combination of internal tools and third-party services to regularly scan your environment for vulnerabilities, including exposed ports.

6. Monitoring and Alerts

  • Enable logging and integrate with a SIEM (Security Information and Event Management) platform.
  • Set up alerts for suspicious activities, such as failed login attempts or unusual query patterns.

Tools and Players in the Security Space

There are numerous tools available to help secure your infrastructure. Key players include:

  • WAF and API Protection: Akamai (Kona Site Defender), Cloudflare, Imperva.
  • Cloud Security Posture Management (CSPM): Wiz, Palo Alto Prisma Cloud, Orca Security.
  • Zero Trust Access: Akamai Enterprise Application Access, Zscaler.
  • Vulnerability Management: Qualys, Tenable Nessus.
  • Database Monitoring: GCP Cloud SQL Insights, Datadog.

These tools work together to provide a multi-layered defense.


Final Thoughts: Security is Everyone’s Responsibility

At the end of the day, securing databases like ClickHouse, Snowflake, or Apache Druid comes down to consistently applying best practices. Security should never take a backseat to development speed.

So, if you’re managing sensitive data, take a moment to double-check your security posture. Are your firewalls up? Is your data segmented? Are you monitoring access?

Don’t wait for your own Wiz Security moment. Secure your stack now and save yourself the headache later. Better yet, reach out to security experts to audit your infrastructure before going live.

Next Up: Bots, AI Models and the new era of digital property protection