Launch HN: API Tracker (YC W20) – Track and manage the APIs you use

dolftax · on Feb 18, 2020

We've been using API Tracker in production for few weeks now. The primary use case for us is to reliably handle webhooks from GitHub which our product relies heavily on (app installation, commit and pull request events).

Unfortunately, GitHub doesn't retry any failed webhooks and when our service goes down for a few seconds, thousands of webhooks fail and pile up. GitHub doesn't provide an API to query the failed webhooks and retry as well. We had to go through the painstaking task of visiting GitHub's app dashboard and click retry on each webhook, one by one.

With API tracker in place, we've updated our GitHub app's webhook delivery URL to send the webhooks to API tracker and they forward it to our services. In worst case when our service goes down for a while, API tracker gracefully retries all the failed webhooks.

Ref: https://github.community/t5/GitHub-API-Development-and/Handl...

thorgaardian · on Feb 18, 2020

Interesting use-case for it. Without prior knowledge of a solution like this I would have suggested you send the webhooks to a queue backed notification system (e.g. SNS backed by SQS) and subscribe to the event topic, but sounds must easier to configure and manage the way you instrumented it. Might be a good use-case for me to try out!

cameroncooper · on Feb 18, 2020

This is something you can easily configure with our automatic retry function. We have an option to return a pre-configured response to the caller, and put the request in a queue to be retried until successful. This allows you to have a sustained outage while making sure all calls are eventually delivered.

ignoramous · on Feb 18, 2020

> This allows you to have a sustained outage while making sure...

Re-driving queue backlogs at services recovering from sustained outages ends in tears almost always. Tread carefully. :)

jrockway · on Feb 19, 2020

Typically people use two pools for circuit breaking, with the limit set lower on retries: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...

capableweb · on Feb 18, 2020

Yeah, this is what I've seen most services who rely on webhooks from another service to do. Add in some monitoring of how many events are not yet processed (set a alarm when there is X amount of events in it) and you're done!

disposedtrolley · on Feb 18, 2020

We're currently building a GitHub integration which receives webhooks and kicks off a bunch of processing actions based on the event type. Your suggestion sounds like a great way to add some observability to the service -- thanks!

bpicolo · on Feb 18, 2020

> In worst case when our service goes down for a while

The worst case is still the same, no? API tracker goes down, GitHub has no redelivery, same deal. More a matter of whose uptime you trust more in this regard.

(That's not to say it's not valuable for this use case)

dolftax · on Feb 18, 2020

Sure. The least we expect from any service sending webhooks is built-in retry strategy. GitHub doesn't. We were thinking of building this ourselves internally but if someone takes care of this for you reliably, why not.

For API tracker, even if their services go down for a short while, it isn't good for business. Though it's been only few weeks using API tracker, we had zero failed webhook deliveries. They say they've designed their systems with this as a primary goal, of course. What if AWS or GCP goes down. It's a matter of trust and SLAs.

ignoramous · on Feb 18, 2020

> What if AWS or GCP goes down. It's a matter of trust and SLAs.

AWS does have a 100% uptime SLA on some of its services: Route53, for example [0]. Not saying that ApiTracker could not be a 100% uptime service (in fact, it looks like that's their explicit goal), just pointing out that AWS / GCP do have services that never "go down" barring global catastrophes.

[0] https://aws.amazon.com/blogs/architecture/a-case-study-in-gl... -- Route 53’s foremost goal is to always meet our promise of a 100% SLA for DNS queries – that all of our customers’ DNS names should resolve all the time.

ignoramous · on Feb 18, 2020

Thanks.

At $349 for 1M calls, doesn't it get expensive? I'd reckon, web-hooking it to Step Functions + AWS Lambda or SNS + SQS would have been a much cost effective solution at the cost of additional resources devoted to development and maintanence, of course. So, if you're comfortable sharing, what did the TCO economics look like for you when you decided to use ApiTracker instead?

the_arun · on Feb 19, 2020

Don't integrators like IFTTT already support GitHub integration? How is API Tracker different from IFTTT?

orliesaurus · on Feb 18, 2020

There have been a number of players in this area throughout the years (Galileo [RIP], Runscope [semi-RIP], Newrelic just to mention a few) for the analytical part ... and countless more for the proxying part (Kong, Envoy, Tyk, etc)

Can you elaborate a little bit more where you place yourself in the market? Why should someone trust you over any of the bigger, older and more stable competitors? Thanks

cameroncooper · on Feb 18, 2020

You're right that there are a number of proxy solutions out there, but most are focused on exposing an API for external consumption (i.e. API producers). We think that by focusing on outbound API calls we can go deep on features that make less sense in those products. The same is true for the analytics solutions (i.e. Newrelic). For example it wouldn't make sense for them to add automatic retry or request caching, but its still a common pain point with integrations and makes a lot of sense for us to build. Finally, some of the tools (i.e. Runscope) are meant for development debugging and don't solve the production pain point.

thorgaardian · on Feb 18, 2020

What you described in the first sentence is commonly referred to as an API gateway - protecting ingress traffic into a publicly accessible service/app (e.g. Kong, AWS API gateway, Ambassador, etc). Lately there's been a lot more generalized solutions in this category for inter-process communication via service meshes like Istio, Gloo, AWS AppMesh, and others - all of which seem to offer a solution that works for both internal traffic routing as well as external (when whitelisted).

Can you offer a description of your product that differentiates it from service mesh solutions? Did you build your own proxy software, or are you built on top of Envoy like many of the other available solutions?

cameroncooper · on Feb 18, 2020

We are not built on top of Envoy and have built our own proxy.

Many of the service mesh solutions require you to deploy and manage them as an on-premise installation. Our primary offering is a hosted solution, but also offer a managed service for on-premise installations.

As you've correctly pointed out the service mesh solutions can allow routing of external traffic, but by focusing on the external calls there are features that make sense for us to build that wouldn't make sense in something like Istio/Gloo/AppMesh. For example, we can build an enhanced experience around third-party APIs to better understand the calls, errors, quotas, etc that are specific to that provider.

thorgaardian · on Feb 18, 2020

That last paragraph is an interesting addition I handn’t considered actually, so great answer! While I’d be hesitant to use a 3rd party, hosted solution for this use case, I can also see how that affords you the ability to optimize fullfilment of requests per destination across all your users. Is it safe to assume that long term you’ll offer this to larger customers via private installation to alleviate security and latency concerns while still benefiting from the destination knowledge of the central hub to configure routing rules?

divbzero · on Feb 18, 2020

Congratulations on the launch!

And thanks for your explanations on how your proxy is similar to and different from API gateway or service mesh solutions.

Having worked on both production monitoring and an API gateway for a Fortune 100 company, I would consider monitoring and proxy to each be valuable in its own right and can envision scenarios where I’d want a standalone product offering for one but not the other.

candiddevmike · on Feb 18, 2020

Why did you build your own proxy instead of using envoy? What short comings did envoy have?

cameroncooper · on Feb 18, 2020

We wanted to architect a system that made it easy to deploy proxy nodes to multiple regions and clouds. We also wanted it to be easy to add functionality specific to our feature set. While we might have been able to achieve our goals by modifying an existing proxy, it made more sense to us to build our own. I have built proxies in previous companies and this was something I was very comfortable doing.

candiddevmike · on Feb 18, 2020

Can you expand on what specific part of envoy prohibited that?

Additionally, as other commenters mentioned, almost every company has rallied around Envoy and is spending considerable time/money making it better. If your solution isn't as performant as envoy, it seems like a poor architecture choice to roll your own, especially given the time/money constraints startups have.

thdxr · on Feb 18, 2020

This is great, I can see the potential of something like this and am jealous I'm not the one working on it!

Don't take the pushback in the other comments too seriously. There is definitely an audience (myself included) who'd want a focus, specific tool

james_s_tayler · on Feb 19, 2020

Ditto. I face the problem this solves every day and from time to time think about the fact someone must be trying to solve this problem.

incognos · on Feb 19, 2020

It is a nice solution but I am weary of anything that proxies my traffic. Especially considering the legislative environment. I've been using Bearer [https://www.bearer.sh] which does not use proxy but a library that hooks into the low level calls - It gives us a great view of what is going on with our third party API calls. You can filter out the calls that do not interest you and separate Production from Staging etc... I did not want to have to build the monitoring infra myself, not a core competency and for the money, it is cheaper to use an external service over 5 years vs. building in-house.

openthc · on Feb 18, 2020

We've had to build similar tools -- but one step further to make three different upstream services behave in a common way. We also added pre&post flight error checking for cases where the backend wouldn't behave nice.

Any plans to "commonize" some different-backends like Twilio / Plivo, or SendGrid, Mandrill, etc, etc?

Very nice work!

cameroncooper · on Feb 18, 2020

Thanks for sharing your experience, we have heard similar things from other companies. We do have plans to create common interfaces for different services like SMS/email as you have suggested. This will allow us to seamlessly fail-over between providers to maintain uptime and performance without any action on the client part.

time0ut · on Feb 19, 2020

Congratulations on your launch! This is very interesting. I have a few questions. I apologize if your website answers these, but I couldn't find clear answers after a cursory glance:

Can you tell me more about how the on-premise installation works and/or is licensed?

Can it manage my authentication mechanisms for me? For example, can I configure it with my client side certificates or have it fetch and cache oauth tokens? We do this in our current solution and it is very nice being able to hide all these details from our applications.

Can it do request/response transformation at all? We have a lot of cases where we want to massage things a little here and there. I realize this might be out of the scope of what you are trying to do, but it would be a nice to have.

We currently do this sort of stuff with a cluster of IBM Datapower gateways. They perform very well but are expensive, difficult to configure, and somewhat opaque.

cameroncooper · on Feb 19, 2020

The standard model for on-premise is an annually licensed managed service. We deploy, manage and monitor the platform on the customer's resources (usually AWS account).

Great questions on credential management and transformations. These are not in the offering today, but they are on our near-term roadmap and we are very excited about their potential. As you've alluded to, there's a lot we can do there.

time0ut · on Feb 19, 2020

Thank you for your response.

I'll be keeping an eye out for enhancements. We have to renew our Datapower licenses annually and are always on the look out for a replacement.

mc3 · on Feb 19, 2020

Hi I have two questions:

1. How can I be sure sensitive date sent via the APIs is secure / private etc?

2. Is you reliability and availability 100%, because if I use you my app's availability is now only as good as yours. We've been bitten by cascading effects of outages of upstream cloud services, but something like this would knockout everything I guess if it was down.

tonylucas · on Feb 18, 2020

Have just signed up, was (yet again) looking for a solution like this for monitoring outbound API calls. Look forward to trying it

sachinag · on Feb 18, 2020

https://cloud.ibm.com/catalog/services/api-connect seems to do a lot of this for free. Probably could also use the community version of Mulesoft: https://developer.mulesoft.com/mulesoft-products-and-licensi...

erik_landerholm · on Feb 18, 2020

Two of the last companies I'd ever want to work with or rely on, other than that...

hn_throwaway_99 · on Feb 19, 2020

Congrats on the launch. I have a ton of 3rd party APIs I'm integrating with, so like you have been thinking about all the stuff I'll need to do to make it reliable in production.

What do you guys do for masking or encrypting sensitive data? I like the opportunity to log everything but a lot of what I'd want to log is PII or sensitive financial data.

cameroncooper · on Feb 19, 2020

We have two approaches to securing this kind of data. Once you specify what fields you want secured we can simply mask it out, or we can hash the data in a way that allows you to search for it if you know the value you are looking for.

FanaHOVA · on Feb 18, 2020

Started using apitracker a week or two ago; it's been great for logging requests and inspecting failed/slow ones. Haven't tried automated retrying yet, but excited to do that soon as well.

cameroncooper · on Feb 18, 2020

Glad it's been able to help! Please let us know if there's anything else we can do.

jconley · on Feb 19, 2020

Interesting service. Have built things like this a couple times. How is the on-premise version priced? Didn't see that on the site anywhere.

cameroncooper · on Feb 19, 2020

Thanks! Our standard model for on-premise is an annual license and depends on some factors such as request volume and features.

datboitom · on Feb 18, 2020

Is this any different than Bearer.sh?

cameroncooper · on Feb 19, 2020

Yes it is. Bearer relies on client side instrumentation. Today that is limited to just Node.js and Ruby applications. While we also support client side instrumentation, the proxy is an important element in our offering because it is language agnostic and enables a new class features that can only be implemented in the proxy (e.g. caching).

gmontard · on Feb 19, 2020

Hi, I'm the co-founder of Bearer.sh.

Indeed Bearer.sh works as a package (Gem, NPM) inside your application, and it automatically instruments your HTTP stack, meaning there are zero-code changes to do on your existing integrations to make it works instantly.

But more interestingly, since we're not a proxy at all, it means you don't have to trust us to deliver that very important API traffic of yours (who would?), offer a sub-millisecond impact on your performance and works with any public, private or crazy certificate or IP restricted APIs! APIs are a liability and dependence to your app, let's not add us to that list!

We're going to launch support for many other stacks soon, and also a whole new set of "active features" as you mentioned, by still beeing 100% NOT a proxy - stay tuned in the coming days :)

Feel free to try, we offer 1M API Call per month for free and you can quickly jump to 20M for $49 only.

We're super happy to see all of the interest around that space these days, let's change the API space altogether

incognos · on Feb 19, 2020

You said it better than I did. That was one of the issues we ran into since we have third party APIs that require IP whitelisting, certs and VPNs, a proxy just won't work in that instance. Can't wait for the Python implementation...

ignoramous · on Feb 18, 2020

Would it be right to say this is sentry.io meets envoy, grpc, and konghq? Super interesting. Congratulations.

How do I manage my API integrations, you ask?

Global Accelerator (GLA) is a key infrastructure piece for a HA service I'm building but for the data-plane. It is such a hassle-free but slightly expensive way to vend anycast IPs (no need to purchase ASNs and/or announce routes from colos across the globe) and have the traffic load-balanced to 25+ AWS regions, that I recommend it instantly to anyone architecting HA services. https://fly.io and https://stackpath.com/edge-computing are viable alternatives. Cloudflare announced MagicTransit which isn't as smooth as AWS GLA in terms of developer experience, whilst Azure and Google offer global-load-balancers, too, and may be even before AWS announced it in 2018? So, really, I think utilizing GLA is something folks should do if they run global HA services. The only issue with using NLB behind AWS GLA is the client-IP is not preserved. In our case, we needed it, so we had to get creative with sticky routing and port assignment (listeners) to do load-balancing / traffic-shaping.

Another HA trick I plan to employ is to use Cloudflare-Workers (200+ PoPs) to front https-traffic to our control-plane endpoints. It lacks observability, monitoring, and alerting unless you're on Cloudflare's enterprise plans. The rate-limiting option is expensive ($0.05 per 10k good requests). I'm sure there's no way to queue requests out-of-the-box, so I can very much see a need for what you've built, and where you guys fit in.

To be honest, I'd be surprised if firebase or API Gateway or KongHQ don't already do what you do, as well. Is that case? If so, keep at it. It is a real need. And as you point out, something that I've had to build for every service and integration point.

A few questions (I went through your website and docs, but here I am):

- How do you handle secrets that the clients might need to share with your service, like Apikeys or Access/SecretKeys?

- Do you also push logs to the customers in addition to them pulling it from your endpoints / UI?

- A bit curious about your logging, monitoring, and alerting infrastructure-- Is it ran on top of CloudWatch or Prometheus or Loggly or Elasticsearch or Lightstep or...?

- Do you support proxying http/REST APIs only?

https://autocode.stdlib.com/ which was discussed a few weeks ago here looks, to me, like a good addition to what you're building.

cameroncooper · on Feb 18, 2020

Thanks for sharing your experience. We love GLA as well.

Great questions.

- For sensitive fields that you do not want retained or searchable, we can mask them out.

- We don't currently have integrations to push our logs to another service, but this is a good use case for us and it's on our near term roadmap.

- We use Elasticsearch in the product, but we also use CloudWatch extensively for our own operations.

- Right now we only support proxying HTTP requests, but are open to supporting other protocols.

ignoramous · on Feb 18, 2020

Thanks a lot, Cameron. I'll watch this space [0] as you continue to add features and improve upon efficiency to pass on the cost savings to your customers :) All the best!

[0] I'd have opted for a newsletter, but I couldn't find any sign-ups forms for it.

notlukesky · on Feb 19, 2020

Is there an SLA roadmap?

derricgilling · on Feb 18, 2020

We at Moesif (https://www.moesif.com/solutions/track-third-party-api) released a similar tool in 2017 and found that many of our customers including Deloitte, UPS, Snap Kitchen, iFit, and Trung's previous company, Snap Kitchen were looking for a way to track APIs without the complexity of a full service mesh like Envoy. Especially if you're hosted in something that cannot run an on-prem service mesh or gateway.

We're a little different in that we also support agent-based rather than just proxy. Meaning we have an SDK that sits out-of-band.