What makes a good API? A device providing an API is only the start of the story; if its unusable, or unreliable its useless. This post is a response to one of a couple of points I made about Kirk Byers’ recent post on Arista’s API and the comments that followed. Much respect to (and admiration for) Arista for either already having these attributes or already having them in mind. I was also rather impressed by the positive and detailed comments they left in response to mine – very refreshing. Anyway, following in the footsteps of my previous The Attributes of a Great CLI article (recently updated), here’s a list of what I think the attributes of a great network device API should be.
As ever, feedback is most welcome, I’ve hardly been thinking about this for years and I’m sure others can make some great suggestions. I’m rather more familiar with XML than I’d like, less so with REST and json.
Nothing new here, we all know how valuable this is. Two of the all-time most popular articles on this site relate to documentation.
I’ve struggled with the documentation for APIs that are well over 10 years old that still don’t have all the information required to make a successful call/request. Taking the approach that an API is a ‘nice to have’ and not critical doesn’t wash any more.
Additionally, rather than just list name-spaces and calls, documentation around the overall structure of an API is also very, very useful.
Useful Error Messages
Required attribute missing? Really, well, you must know what it is, tell me why don’t you.
I can imagine in some cases this can be difficult for API (or even software) developers; who knows what crazy stuff people will try, especially the inexperienced. Still, the common stuff shouldn’t be too hard right? If an attribute is expected and isn’t supplied, you know what it is, tell me in the error message.
The time I’ve spent trying to ascertain whether something should be wrapped in or really beggars belief.
Useful Success Messages
OK, so I’ve made a successful call, or request, at least, I think I have! I didn’t get an error so its implied right? Sorry, that doesn’t do it for me at all. That is a nightmare.
Null and zero return values are not OK.
You want me to create something, then do another query to check it exists and then write some logic to cope with the fact it might not? You might be keeping me in a job but no thanks.
We’ve rolled out a super shiny, fancy automation system with integrated monitoring and capacity management. I’m at the command line of some device and it seems sluggish. I check the CPU, it’s pretty high. I need to know why.
Whatever standard tool I can use to check what’s burning CPU cycles should be able to tell me if the API is an issue. The API itself should be able to tell me. My existing SNMP-based systems should be able to tell me.
If I start polling for basic statistics via API every 30 seconds, I don’t want my devices falling over and running low on resources (bad idea I know). If I request details of a very large routing table, I don’t want to wait so long that by the time I have it its likely out of date. This closely relates to the Independently Monitor-able attribute.
Don’t use a web server that’s not up to the job, make sure its well tuned to the platform, make sure the integration between that and the statistics or configuration subsystems are performant. Support compression and perhaps I’m asking for too much but how about support SPDY, or HTTP/2.0 when it arrives.
Data Plane Independent/Resource Constrained
Oh dear, I lost all my OSPF neighbours because I made a ‘heavy’ request that consumed all the device’s CPU for a couple of minutes.
Another one that I’m guessing is harder than it sounds but I don’t think its unreasonable to ask for a guarantee that the API can’t affect the data plane and traffic forwarding. If it can, then in some ways its just another DoS vector. This closely relates to the Performant and Independently Monitor-able attributes.
Even better, let me configure (via API or other methods) resource constraints such as a maximum connection rate, idle time-outs and similar. Take me to the top with the ability to restrict bandwidth usage too.
See the next point.
If I find out SSLv3 has become insecure, I really want to be able to modify something so my API isn’t relying on it. If I want to use ECDHE because its faster, or PFS or whatever, let me. Let’s not forget my network device might be a security device (not sure if firewalls count these days).
Equally, if I need to debug something, I want the option to drop the cipher string down to something low I can decrypt using packet capture tools like Wireshark or ssldump. Better yet, let me enable DEBUG level logging so I don’t need to even bother with that. Let me do this via the API.
Client-side tools perhaps avoid the need for packet capture or detailed logging but regardless, we all want control here, ‘trust me’.
Returning a payload in an error message is not OK if what I sent contains a password.
Also, please test for DoS vectors etc.
Providing about half the capabilities of the CLI isn’t helpful. I don’t want to mix and match, who would?
Transactions & Roll-back
This is invaluable. Allow me to collect any number of changes (or whatever) into a collection, a transaction. If any one change, modification or deletion fails, roll them all back. No need for that neighbour statement if the interface it relies upon never came up.
Doing this yourself in some fashion is a real pain.
If I can see that for some reason this API is consuming excessive resources, not responding in a reasonable time, whatever, let me restart it. As per Useful, don’t force me back to the CLI and/or the service command or some other method. Let the API rule.
I came across this just today with a REST API. I issued a GET against a specific name-space, essentially requesting a list of ‘sub-commands’ (I’m not sure I’m happy with this name) that I could make further calls to. What did I get back? A completely unorganised list! No alphabetical order of name-spaces, no order of any kind. Talk about make my job harder parsing the response. This isn’t something I’d want to do programatically which makes it even worse; what shall I do, whack the response into Excel and try and sort it. This doesn’t help when ‘exploring’ a new API.
If output is considerable and, more importantly, can be sorted in some manner (alphabetically, numerically or whatever) without affecting the usefulness of the response, please sort it. Sure, if I ask for a list of routes, I’ll be wanting to deal with that in my own way, do what you want. But, if I’m requesting a list of profiles and they are uniquely identified by name, give me a list (or dictionary) sorted from a to z. Much appreciated.
It would be great to get something that told me how many records had been returned; then perhaps I could reserve some CPU or set a timer or something based on that.
I’m not sure this is something I want, but others seem to. Return 20 records at a time, or allow me to request that amount only, and request more if I want. Its all about control and data at the end of the day.
Clear Scope & Context
Another one I came across today. This one really ‘messed with my head’; I was basically making a request to a management device, requesting it ‘discover’ another device that I wanted it to eventually manage. I got a HTTP 202 response to my request, which made sense; “we got it, working on it” in plain English, oh and we assigned a UUID of xxxx. OK, nice, I’ll move onto querying that UUID. What did I get back? Well, I got status: FINISHED and status: FAILED. FINISHED Referred to the discovery ‘job’, failed referred to the outcome of that job. It would be very helpful if it was something like this instead: jobstatus: FINISHED and joboutcome: FAILED.
Nothing has pained me more recently that having to accommodate an API’s requirement for <item> in some instances and <items> in others, regardless of whether there can be multiple values or not. If there can be only one, use <item>, if there can be more than one, use <items>. Be logical here. There can only be ONE base license, many add-on licenses. Only one serial number. If there is a ‘status’, be clear about what it refers to (as per Clear Scope & Context).
Consistent Data Model
As noted by the guys from Arista in the comments below, a consistent data model is also essential. Returned values should be in a useful format and consistent amongst different calls or commands. If free memory is returned in Mb then total memory should also be returned in Mb, not Kb or Gb.
The ‘producers’ of your API? Have they ever tried to ‘read/write’ from/to the API? Actually use it to do something useful? Do that. Come back to me with improvements; there will be many. I don’t want to pay to do your testing.
The device (the data plane) probably already does in some fashion, so does the CLI and access control system. Why not the API? Self-service and many other features rely on this and if your device isn’t playing ball, well, it might not be in the game for long.
Forgive me the somewhat aggressive format; this all seems rather obvious to me and even in my limited experience I’ve paid quite a high price for such features lacking. I also feel, considering the fundamental role of networking, that bad APIs will really take the shine off (and dollars from) emerging solutions, automation projects, PoCs and all the rest. That would be a real shame and ultimately delay improvements in our industry and our jobs (and cost the worst vendors some serious money).
As I said, I’m very open to both suggestions and any other kind of comment. Thanks.