Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: API Tracker (YC W20) – Track and manage the APIs you use
125 points by cameroncooper on Feb 18, 2020 | hide | past | favorite | 48 comments
Hey HN!

We’re Cameron, Trung and Matt from API Tracker (https://www.apitracker.com). We make tools to help with using third-party APIs in production.

When software teams integrate with APIs they often run into outages, network issues, interface changes or even bugs that cause unexpected behavior in the rest of their system. These problems are hard to predict and prepare for so most teams don’t deal with them until there's a outage and have to do an emergency build to add logging and get to a root cause.

This is what happened to us. Trung and I are both software engineers and we spent a lot of time and energy trying to make our API integrations robust and reliable in production. We found ourselves instrumenting all our API calls so we could know how many calls we were making, how long they were taking and if they were failing. We set up alerts for errors and latency increases and integrated with PagerDuty. We wrote retry logic with exponential backoff. We wrote failover from one API provider to another. At the end of it all we built a lot of tooling that required maintenance and wasn’t even applied uniformly across all of our integrations.

After building all this infrastructure we realized that many other teams are reinventing the same wheel.

To solve this problem we built an API proxy that takes requests and relays them to the API provider. By proxying this traffic we are able to instrument each call to measure latency, record status codes, headers and bodies, and add reliability features like automatic retry with exponential backoff. From there we can monitor and alert on issues and provide a searchable call log for debugging and auditability.

We knew that because we were asking teams to run their mission critical API calls through us that we had to build a highly available and scalable proxy architecture. We’ve done this by designing a proxy that can be distributed across multiple regions and clouds. We are currently running out of AWS. Global Accelerator allows us to use their private internet backbone to quickly get traffic to our proxies which run behind AWS Network Load Balancers. While this can help us ensure resilience against infrastructure outages, we also need to protect against self-inflicted wounds like bugs and bad deployments. Upon release we bring up a new set of proxy instances, deploy the code, and run our full test suite to make sure that each instance is able to proxy requests correctly. Once all instances are healthy they begin to go into the load balancer.

For companies with more stringent needs we support on-premise installations as well as a client-side SDK that can do instrumentation without the proxy.

Today we offer the service as a subscription. We hope to make it easy for teams to get visibility and control across all their integrations without having to build it themselves. This includes:

- Detailed logging on all of their third-party API calls

- Monitoring and alerting for increased latency and error rates

- Reliability features like automatic retry, circuit breaker and request queueing

- Rate limit and quota monitoring

We would love to hear from the community how you are managing your API integrations. Our story is a result of our experiences and how we dealt with them, but we know the HN community has seen it all. We would love to hear from you about problems you’ve had and how you dealt with them. Please leave a comment or send us an email to founders@apitracker.com. Looking forward to the discussion!



We've been using API Tracker in production for few weeks now. The primary use case for us is to reliably handle webhooks from GitHub which our product relies heavily on (app installation, commit and pull request events).

Unfortunately, GitHub doesn't retry any failed webhooks and when our service goes down for a few seconds, thousands of webhooks fail and pile up. GitHub doesn't provide an API to query the failed webhooks and retry as well. We had to go through the painstaking task of visiting GitHub's app dashboard and click retry on each webhook, one by one.

With API tracker in place, we've updated our GitHub app's webhook delivery URL to send the webhooks to API tracker and they forward it to our services. In worst case when our service goes down for a while, API tracker gracefully retries all the failed webhooks.

Ref: https://github.community/t5/GitHub-API-Development-and/Handl...


Interesting use-case for it. Without prior knowledge of a solution like this I would have suggested you send the webhooks to a queue backed notification system (e.g. SNS backed by SQS) and subscribe to the event topic, but sounds must easier to configure and manage the way you instrumented it. Might be a good use-case for me to try out!


This is something you can easily configure with our automatic retry function. We have an option to return a pre-configured response to the caller, and put the request in a queue to be retried until successful. This allows you to have a sustained outage while making sure all calls are eventually delivered.


> This allows you to have a sustained outage while making sure...

Re-driving queue backlogs at services recovering from sustained outages ends in tears almost always. Tread carefully. :)


Typically people use two pools for circuit breaking, with the limit set lower on retries: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...


Yeah, this is what I've seen most services who rely on webhooks from another service to do. Add in some monitoring of how many events are not yet processed (set a alarm when there is X amount of events in it) and you're done!


We're currently building a GitHub integration which receives webhooks and kicks off a bunch of processing actions based on the event type. Your suggestion sounds like a great way to add some observability to the service -- thanks!


> In worst case when our service goes down for a while

The worst case is still the same, no? API tracker goes down, GitHub has no redelivery, same deal. More a matter of whose uptime you trust more in this regard.

(That's not to say it's not valuable for this use case)


Sure. The least we expect from any service sending webhooks is built-in retry strategy. GitHub doesn't. We were thinking of building this ourselves internally but if someone takes care of this for you reliably, why not.

For API tracker, even if their services go down for a short while, it isn't good for business. Though it's been only few weeks using API tracker, we had zero failed webhook deliveries. They say they've designed their systems with this as a primary goal, of course. What if AWS or GCP goes down. It's a matter of trust and SLAs.


> What if AWS or GCP goes down. It's a matter of trust and SLAs.

AWS does have a 100% uptime SLA on some of its services: Route53, for example [0]. Not saying that ApiTracker could not be a 100% uptime service (in fact, it looks like that's their explicit goal), just pointing out that AWS / GCP do have services that never "go down" barring global catastrophes.

[0] https://aws.amazon.com/blogs/architecture/a-case-study-in-gl... -- Route 53’s foremost goal is to always meet our promise of a 100% SLA for DNS queries – that all of our customers’ DNS names should resolve all the time.


Thanks.

At $349 for 1M calls, doesn't it get expensive? I'd reckon, web-hooking it to Step Functions + AWS Lambda or SNS + SQS would have been a much cost effective solution at the cost of additional resources devoted to development and maintanence, of course. So, if you're comfortable sharing, what did the TCO economics look like for you when you decided to use ApiTracker instead?


Don't integrators like IFTTT already support GitHub integration? How is API Tracker different from IFTTT?


There have been a number of players in this area throughout the years (Galileo [RIP], Runscope [semi-RIP], Newrelic just to mention a few) for the analytical part ... and countless more for the proxying part (Kong, Envoy, Tyk, etc)

Can you elaborate a little bit more where you place yourself in the market? Why should someone trust you over any of the bigger, older and more stable competitors? Thanks


You're right that there are a number of proxy solutions out there, but most are focused on exposing an API for external consumption (i.e. API producers). We think that by focusing on outbound API calls we can go deep on features that make less sense in those products. The same is true for the analytics solutions (i.e. Newrelic). For example it wouldn't make sense for them to add automatic retry or request caching, but its still a common pain point with integrations and makes a lot of sense for us to build. Finally, some of the tools (i.e. Runscope) are meant for development debugging and don't solve the production pain point.


What you described in the first sentence is commonly referred to as an API gateway - protecting ingress traffic into a publicly accessible service/app (e.g. Kong, AWS API gateway, Ambassador, etc). Lately there's been a lot more generalized solutions in this category for inter-process communication via service meshes like Istio, Gloo, AWS AppMesh, and others - all of which seem to offer a solution that works for both internal traffic routing as well as external (when whitelisted).

Can you offer a description of your product that differentiates it from service mesh solutions? Did you build your own proxy software, or are you built on top of Envoy like many of the other available solutions?


We are not built on top of Envoy and have built our own proxy.

Many of the service mesh solutions require you to deploy and manage them as an on-premise installation. Our primary offering is a hosted solution, but also offer a managed service for on-premise installations.

As you've correctly pointed out the service mesh solutions can allow routing of external traffic, but by focusing on the external calls there are features that make sense for us to build that wouldn't make sense in something like Istio/Gloo/AppMesh. For example, we can build an enhanced experience around third-party APIs to better understand the calls, errors, quotas, etc that are specific to that provider.


That last paragraph is an interesting addition I handn’t considered actually, so great answer! While I’d be hesitant to use a 3rd party, hosted solution for this use case, I can also see how that affords you the ability to optimize fullfilment of requests per destination across all your users. Is it safe to assume that long term you’ll offer this to larger customers via private installation to alleviate security and latency concerns while still benefiting from the destination knowledge of the central hub to configure routing rules?


Congratulations on the launch!

And thanks for your explanations on how your proxy is similar to and different from API gateway or service mesh solutions.

Having worked on both production monitoring and an API gateway for a Fortune 100 company, I would consider monitoring and proxy to each be valuable in its own right and can envision scenarios where I’d want a standalone product offering for one but not the other.


Why did you build your own proxy instead of using envoy? What short comings did envoy have?


We wanted to architect a system that made it easy to deploy proxy nodes to multiple regions and clouds. We also wanted it to be easy to add functionality specific to our feature set. While we might have been able to achieve our goals by modifying an existing proxy, it made more sense to us to build our own. I have built proxies in previous companies and this was something I was very comfortable doing.


Can you expand on what specific part of envoy prohibited that?

Additionally, as other commenters mentioned, almost every company has rallied around Envoy and is spending considerable time/money making it better. If your solution isn't as performant as envoy, it seems like a poor architecture choice to roll your own, especially given the time/money constraints startups have.


This is great, I can see the potential of something like this and am jealous I'm not the one working on it!

Don't take the pushback in the other comments too seriously. There is definitely an audience (myself included) who'd want a focus, specific tool


Ditto. I face the problem this solves every day and from time to time think about the fact someone must be trying to solve this problem.


It is a nice solution but I am weary of anything that proxies my traffic. Especially considering the legislative environment. I've been using Bearer [https://www.bearer.sh] which does not use proxy but a library that hooks into the low level calls - It gives us a great view of what is going on with our third party API calls. You can filter out the calls that do not interest you and separate Production from Staging etc... I did not want to have to build the monitoring infra myself, not a core competency and for the money, it is cheaper to use an external service over 5 years vs. building in-house.


We've had to build similar tools -- but one step further to make three different upstream services behave in a common way. We also added pre&post flight error checking for cases where the backend wouldn't behave nice.

Any plans to "commonize" some different-backends like Twilio / Plivo, or SendGrid, Mandrill, etc, etc?

Very nice work!


Thanks for sharing your experience, we have heard similar things from other companies. We do have plans to create common interfaces for different services like SMS/email as you have suggested. This will allow us to seamlessly fail-over between providers to maintain uptime and performance without any action on the client part.


Congratulations on your launch! This is very interesting. I have a few questions. I apologize if your website answers these, but I couldn't find clear answers after a cursory glance:

Can you tell me more about how the on-premise installation works and/or is licensed?

Can it manage my authentication mechanisms for me? For example, can I configure it with my client side certificates or have it fetch and cache oauth tokens? We do this in our current solution and it is very nice being able to hide all these details from our applications.

Can it do request/response transformation at all? We have a lot of cases where we want to massage things a little here and there. I realize this might be out of the scope of what you are trying to do, but it would be a nice to have.

We currently do this sort of stuff with a cluster of IBM Datapower gateways. They perform very well but are expensive, difficult to configure, and somewhat opaque.


The standard model for on-premise is an annually licensed managed service. We deploy, manage and monitor the platform on the customer's resources (usually AWS account).

Great questions on credential management and transformations. These are not in the offering today, but they are on our near-term roadmap and we are very excited about their potential. As you've alluded to, there's a lot we can do there.


Thank you for your response.

I'll be keeping an eye out for enhancements. We have to renew our Datapower licenses annually and are always on the look out for a replacement.


Hi I have two questions:

1. How can I be sure sensitive date sent via the APIs is secure / private etc?

2. Is you reliability and availability 100%, because if I use you my app's availability is now only as good as yours. We've been bitten by cascading effects of outages of upstream cloud services, but something like this would knockout everything I guess if it was down.


Have just signed up, was (yet again) looking for a solution like this for monitoring outbound API calls. Look forward to trying it


https://cloud.ibm.com/catalog/services/api-connect seems to do a lot of this for free. Probably could also use the community version of Mulesoft: https://developer.mulesoft.com/mulesoft-products-and-licensi...


Two of the last companies I'd ever want to work with or rely on, other than that...


Congrats on the launch. I have a ton of 3rd party APIs I'm integrating with, so like you have been thinking about all the stuff I'll need to do to make it reliable in production.

What do you guys do for masking or encrypting sensitive data? I like the opportunity to log everything but a lot of what I'd want to log is PII or sensitive financial data.


We have two approaches to securing this kind of data. Once you specify what fields you want secured we can simply mask it out, or we can hash the data in a way that allows you to search for it if you know the value you are looking for.


Started using apitracker a week or two ago; it's been great for logging requests and inspecting failed/slow ones. Haven't tried automated retrying yet, but excited to do that soon as well.


Glad it's been able to help! Please let us know if there's anything else we can do.


Interesting service. Have built things like this a couple times. How is the on-premise version priced? Didn't see that on the site anywhere.


Thanks! Our standard model for on-premise is an annual license and depends on some factors such as request volume and features.


Is this any different than Bearer.sh?


Yes it is. Bearer relies on client side instrumentation. Today that is limited to just Node.js and Ruby applications. While we also support client side instrumentation, the proxy is an important element in our offering because it is language agnostic and enables a new class features that can only be implemented in the proxy (e.g. caching).


Hi, I'm the co-founder of Bearer.sh.

Indeed Bearer.sh works as a package (Gem, NPM) inside your application, and it automatically instruments your HTTP stack, meaning there are zero-code changes to do on your existing integrations to make it works instantly.

But more interestingly, since we're not a proxy at all, it means you don't have to trust us to deliver that very important API traffic of yours (who would?), offer a sub-millisecond impact on your performance and works with any public, private or crazy certificate or IP restricted APIs! APIs are a liability and dependence to your app, let's not add us to that list!

We're going to launch support for many other stacks soon, and also a whole new set of "active features" as you mentioned, by still beeing 100% NOT a proxy - stay tuned in the coming days :)

Feel free to try, we offer 1M API Call per month for free and you can quickly jump to 20M for $49 only.

We're super happy to see all of the interest around that space these days, let's change the API space altogether


You said it better than I did. That was one of the issues we ran into since we have third party APIs that require IP whitelisting, certs and VPNs, a proxy just won't work in that instance. Can't wait for the Python implementation...


Would it be right to say this is sentry.io meets envoy, grpc, and konghq? Super interesting. Congratulations.

How do I manage my API integrations, you ask?

Global Accelerator (GLA) is a key infrastructure piece for a HA service I'm building but for the data-plane. It is such a hassle-free but slightly expensive way to vend anycast IPs (no need to purchase ASNs and/or announce routes from colos across the globe) and have the traffic load-balanced to 25+ AWS regions, that I recommend it instantly to anyone architecting HA services. https://fly.io and https://stackpath.com/edge-computing are viable alternatives. Cloudflare announced MagicTransit which isn't as smooth as AWS GLA in terms of developer experience, whilst Azure and Google offer global-load-balancers, too, and may be even before AWS announced it in 2018? So, really, I think utilizing GLA is something folks should do if they run global HA services. The only issue with using NLB behind AWS GLA is the client-IP is not preserved. In our case, we needed it, so we had to get creative with sticky routing and port assignment (listeners) to do load-balancing / traffic-shaping.

Another HA trick I plan to employ is to use Cloudflare-Workers (200+ PoPs) to front https-traffic to our control-plane endpoints. It lacks observability, monitoring, and alerting unless you're on Cloudflare's enterprise plans. The rate-limiting option is expensive ($0.05 per 10k good requests). I'm sure there's no way to queue requests out-of-the-box, so I can very much see a need for what you've built, and where you guys fit in.

To be honest, I'd be surprised if firebase or API Gateway or KongHQ don't already do what you do, as well. Is that case? If so, keep at it. It is a real need. And as you point out, something that I've had to build for every service and integration point.

A few questions (I went through your website and docs, but here I am):

- How do you handle secrets that the clients might need to share with your service, like Apikeys or Access/SecretKeys?

- Do you also push logs to the customers in addition to them pulling it from your endpoints / UI?

- A bit curious about your logging, monitoring, and alerting infrastructure-- Is it ran on top of CloudWatch or Prometheus or Loggly or Elasticsearch or Lightstep or...?

- Do you support proxying http/REST APIs only?

https://autocode.stdlib.com/ which was discussed a few weeks ago here looks, to me, like a good addition to what you're building.


Thanks for sharing your experience. We love GLA as well.

Great questions.

- For sensitive fields that you do not want retained or searchable, we can mask them out.

- We don't currently have integrations to push our logs to another service, but this is a good use case for us and it's on our near term roadmap.

- We use Elasticsearch in the product, but we also use CloudWatch extensively for our own operations.

- Right now we only support proxying HTTP requests, but are open to supporting other protocols.


Thanks a lot, Cameron. I'll watch this space [0] as you continue to add features and improve upon efficiency to pass on the cost savings to your customers :) All the best!

[0] I'd have opted for a newsletter, but I couldn't find any sign-ups forms for it.


Is there an SLA roadmap?


We at Moesif (https://www.moesif.com/solutions/track-third-party-api) released a similar tool in 2017 and found that many of our customers including Deloitte, UPS, Snap Kitchen, iFit, and Trung's previous company, Snap Kitchen were looking for a way to track APIs without the complexity of a full service mesh like Envoy. Especially if you're hosted in something that cannot run an on-prem service mesh or gateway.

We're a little different in that we also support agent-based rather than just proxy. Meaning we have an SDK that sits out-of-band.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: