Wallarm (YC S16) Uses Incoming Hacker Attacks to Reveal Security Flaws

stepan_ · on Aug 3, 2016

Hey there. Stepan, co-founder of Wallarm, here. Feel free to ask any questions.

daveloyall · on Aug 3, 2016

How do you intend to prevent false positives?

As a power user, I am concerned about the possibility of widespread adoption of your product and/or others like it.

I don't want my bank to ban me just because I use a browser extension to capture my own cookies from my own valid session and pipe them into a shell script I wrote to invoke curl to harvest my latest bank statement as a PDF and store it locally.

Supposing that your system wouldn't flag that activity as malicious, what about the vulgar things that I did to their servers while I was developing my archive-bank-statement tool?

NB. Please ignore the implication that my tool is complete or useful. It's not... :)

cloudjacker · on Aug 3, 2016

Reminds me of the days when port scanning was considered malicious

stepan_ · on Aug 3, 2016

The main idea about Wallarm is to get inner knowledge of how the application works and how users use it. Based on this data, we craft dynamic rules for every single applications or API.

The simplest example is what data transmitted in different parameters of the form field or API calls. For example, it's OK if someone put an SQL Injection payload at Stack-overflow site in the form writing a security-related article. It can be a normal behavior. Meanwhile, SQL injection payload is probably a malicious thing for a login form at your bank website.

We wouldn't ban request only if it is sent with curl. There is a set of different factors and statistics that are taken into the account. E.g. if you run this requests too quickly and it is sent with curl, it can be considered as a malicious activity.

XMPPwocky · on Aug 4, 2016

So I can CSRF the bank site with a SQLi in the login form, and ban anybody who clicks my link?

dsacco · on Aug 3, 2016

I'm glad to see innovation in this area. I have a few questions.

Can you tell me where (or even how) you acquired the data to train your machine learning system?

If you could go into some detail about the specific techniques you've used that would also be great to know.

Finally, what does your service do that is not provided by something like SiftScience? I imagine there is overlap here - is it that you primarily focus on web application security instead of fraud signaling?

stepan_ · on Aug 3, 2016

Thanks for feedback!

1. Customers analyze traffic with locally installed NGINX-based instances (there is not DNS take-over). They send applications/traffic statistics to Wallarm Cloud so we can run machine-learning stuff. We had a lot of work done for initial training of the system using our own experience in web app security (more than 250+ pentests for top-tier companies + a lot of researches done by our team like SSRF bible). We also use different honeypots and now statistics of customers with a high volume traffic.

2. There are some details about ML technique covered by Ivan for another comment

3. We have different tasks with SiftScience. SiftScience provides a fraud-detection. Wallarm protects web apps and APIs against data breaches. But these tasks are related for some of our customers.

dsacco · on Aug 3, 2016

Thanks for the answers.

I have clarifying questions:

How much of your machine learning is used for understanding the application (as Ivan said elsewhere, clustering login functionality together), and how much is actually used for fingerprinting vulnerability identification attempts on the part of user input?

To place this in a broader context, you do not need machine learning for identifying many cases of malicious user input, you can rely on simple heuristics. There is likely no reason for a user to submit `<script>alert(1);</script`, which is an obvious test for XSS low hanging fruit. Any good WAF will do this.

Given that, does Wallarm use mostly heuristics for identifying malicious user input, or does it also combine machine learning into this process at all to find non-obvious input patterns that could be indicative of penetration testing attempts?

wlrm · on Aug 3, 2016

Our attack type recognition based on machine learning which can at first produce lexems and, secondly. syntax constructions (patterns) by existing attacks. For example, in the case of memcached injections (more details: https://www.blackhat.com/docs/us-14/materials/us-14-Novikov-...) we can train system to detect these attacks without regexps or new heuristic rules.

jasontan · on Aug 3, 2016

CEO and cofounder of Sift Science here. I think we are complementary, actually. Wallarm focuses on security vulnerabilities (like a more automated HackerOne), and we focus more on "application abuse" (user-level fraud).

Great job, Wallarm!

dsacco · on Aug 3, 2016

That's what I figured, I was just hoping to get a bit more nuance on the security vulnerabilities.

I actually thought Hackerone or Bugcrowd would be first to market with a product like this.

stepan_ · on Aug 3, 2016

Hackerone and Bugcrowd do a great job. And we recommmend to run bug-bounty programs all the time.

But companies which run fast and deploy code everyday with CI/CD (or several times a day) it's almost impossible not to introduce new vulnerabilities. This is where solutions for continuous security are incredibly helpful.

stepan_ · on Aug 3, 2016

Thanks Jason! Will be at DEFCON this year? It'll be great to meet there.

zitterbewegung · on Aug 3, 2016

What kind of machine learning techniques do you use to generate your results?

wlrm · on Aug 3, 2016

Ivan, co-founder of Wallarm, here.

There are few different tasks for machine learning.

1. Traffic clustering (hierarchical clustering algorithms). We use ML to understand how your application works in terms of business logic. E.g. clustering numbers of HTTP requests for /login as cluster determined by (HTTP_header->HOST="yoursite.com" + HTTP_URL->"/login" + ...).

2. Data profiling inside clusters. We use statistical distribution algorithms to understand which data is normal for fields POST->login and POST->password inside cluster from p.1. It is not hardcoded data templates like "only digits" or smth like this. Wallarm generates profiles dynamically.

3. Fuzzy search. Those data which is abnormal (from p.2), we understand if it looks like XSS or SQLi or any other attack or not.

mpivtora · on Aug 3, 2016

Very impressive stuff, way to go guys!

stepan_ · on Aug 3, 2016

Thanks!