HTTP Query Params 101
Summary
A long time ago, we had simpler lives with our monolithic apps talking to relational databases. SQL supported having myriad conditions with the WHERE clause and conditions. As time progressed, every application became a webapp and we started developing HTTP services talking JSON, consumed by a variety of client applications such as mobile clients, browser apps etc. So, some of the filtering that we were doing via SQL WHERE clauses now needed a way to be represented via HTTP query parameters. This blog post tries to explain the use of HTTP Query Parameters for newbie programmers, via some examples. This is NOT a post on how to cleanly define/structure your REST APIs. The aim is to just give an introduction to HTTP Query Parameters.Action
Let us build an ebooks online store. For each book in our database, let us have the following data:BookID - String - Uniquely identifies a bookLet there be an API to get a list of books. It would be something like:
Title - String
Authors - String Array
Content - String - Base64 encoded content of the book
PublishedOn - Date
ISBN - String
Pages - Integer - Number of pages in the book
GET https://api.example.com/booksThe above API will return all the information about each of the book in our system except the Content. Though this would work for a small book shop, if you have like a billion books, this puts unnecessary stress on the server, client and the network bandwidth to hold all the book data when the user is probably not bothered to see more than, say 10 titles, in most cases. So our API could now needs a way to return only N titles. Also, from which position, we need to return the N titles also needs to be specified, say Mth position. These fields are called limit and offset usually. So our API becomes:
GET https://api.example.com/books?offset=5&limit=10
Here we have added two fields, offset and limit to our API. However there are two things that are unclear in this API definition.
The first ambiguity is: We do not know which field will be used for finding the sequence of the books. Is it the BookID ? Is it the PublishedOn Date ? The former is a string, how do we sort it to find the order (alphabetically in case-sensitive way or insensitive way). The latter is a date field and there can be multiple books which have the same published date. So, how do ensure that a book will always be in the same position in the sort order between two different HTTP requests ?
The second ambiguity is: What if these fields are not specified or if specified with invalid values ? How does the API handle it ?
To solve both of these ambiguities, our API docs need to become more precise. One possible solution (out of many solutions) to address the first ambiguity is, we will always generate only BookIDs with lower case strings and will always do a toLower conversion. We will always use BookID as the sort order and always will sort in ascending order. Our BookIDs field will always be monotonically increasing; IOW, once we have given a BookID of "abc" to a Book, we would never generate a BookID of "aba" again.
Instead of the String unique ID, we could also use numeric fields, which could directly map to a database AUTO_INCREMENT or BIGSERIAL field and ORMs can intelligently map the offset, limit fields automatically.
Note that it is not uncommon to add "sort_by" and/or "order_by" requirements, where the clients can choose to change the sort field (Publication Date instead of BookID) and also the sorting order (ascending or descending). There are multiple ways to represent this via the query parameters. Some examples are:
Sort by Title (default ascending):
GET https://api.example.com/books?sort_by=title
Sort by Title (explicitly ascending):
GET https://api.example.com/books?sort_by=asc(title)
GET https://api.example.com/books?sort_by=+(title)
GET https://api.example.com/books?sort_by=title.asc
Sort by Title (explicitly descending):
GET https://api.example.com/books?sort_by=desc(title)
GET https://api.example.com/books?sort_by=-(title)
GET https://api.example.com/books?sort_by=title.desc
Sort by Multiple Fields:
GET https://api.example.com/books?sort_by=asc(title),desc(published_on)
GET https://api.example.com/books?sort_by=title,-published_on
GET https://api.example.com/books?sort_by=+title,-published_on
For solving the second ambiguity, that we saw earlier, the safest solution is to make our HTTP APIs return 400 incase we come across invalid data, For example, if an invalid starting offset is given. We also need to explicitly document the default values for these query parameters, if nothing is specified.
Filters
We have used the GET above to get all the Books information and filter based on the cardinality. However, there may be a need for other filters. For example, we want to get the books by only a particular author. So we could add more parameters, such as:GET https://api.example.com/books?author=crichtonHere the author is a query parameter which takes a string as an argument. This API will return any book whose author name matches "crichton". Also note that, these individual filters could be then combined with other filters, for example:
GET https://api.example.com/books?author=crichton&offset=0&limit=5will return the first five books of author "crichton". So the API implementation in backend should apply the "limits" and "offset" after applying the author="crichton" filter. The API docs need to convey this in an unambigous way on what positions the "offset" would work if there are other filter conditions. The other choice is to return the books by "crichton" in the first five results in the list of all books. All your APIs need to be consistent and you can choose either of the practices, even though I prefer the former.
More Filter conditions
In the above API definition, we were returning the books whose author was "crichton" exactly. However, it may not be always possible to give an Exact Equals condition for our API. Our API may require to accept query parameters which should be loosely applied. For example, the author may be stored in our system as "Michael Crichton" so applying "crichton" may not be sufficient. Similarly, we may need to get a list of all books published after 2005 but before 2015.Our query parameters, in addition to "equal-to" may need to support, less-than, less-than-or-equal-to, greater-than, greater-than-or-equal-to, not-equal-to, contains (for string matches), not-contains and so on.
Our query parameter need to pass these operators too in addition to the parameter name and the desired value(s). One possible approach for this could be to add these operator names to the query parameter. For example:
GET https://api.example.com/books?published_on[gte]="2005-01-01"&published_on[lte]="2015-12-31"
GET https://api.example.com/books?published_on.gte="2005-01-01"&published_on.lte="2015-12-31"
assert.deepEqual(qs.parse('published_on[lte]="2015-12-31"'), { published_on: { lte: '2015-12-31' } });
Note that I am using YYYY-MM-DD as the date format. It is strongly recommended to use a single date format for all your APIs, whichever format you choose. Similarly, while working with time, choose a single timezone, preferably UTC.
Instead of changing the parameter names, we can add the operator to the operand on the RHS of the equal sign too. For example:
GET https://api.example.com/books?published_on=gte:"2005-01-01"&published_on=lte:"2015-12-31"Here we are using "gte" and "lte" to denote the operator but specify it in the RHS of the equal symbol.
If you have a long list of filters and operators, you should probably avoid using HTTP Query Parameters. An API that receives these complex query strings, perhaps with your own DSL, (either as JSON or any other serialisable format), as a HTTP Request body, would make code maintenance simpler. Elasticsearch uses this method.
Conclusion
I have been trying to write this for some time now but kept on deferring for months now. So I decided to just type it out today in a stretch. The post is not as fine as I wanted it to be, but it is better to at least write an unrefined post than not writing anything at all. I hope you have found this useful. Let me know if you have any comments, feedback or corrections in this post. Thanks if you have read till here.
PEAR down – Taking Horde to Composer
Since Horde 4, the Horde ecosystem heavily relied on the PEAR infrastructure. Sadly, this infrastructure is in bad health. It’s time to add alternatives.
Everybody has noticed the recent PEAR break-in.
A security breach has been found on the http://pear.php.net webserver, with a tainted go-pear.phar discovered. The PEAR website itself has been disabled until a known clean site can be rebuilt. A more detailed announcement will be on the PEAR Blog once it’s back online. If you have downloaded this go-pear.phar in the past six months, you should get a new copy of the same release version from GitHub (pear/pearweb_phars) and compare file hashes. If different, you may have the infected file.
While I am writing these lines, pear.php.net is down. Retrieval links for individual pear packages are down. Installation of pear packages is still possible from private mirrors or linux software distribution packages (openSUSE, Debian, Ubuntu). Separate pear servers like pear.horde.org are not directly affected. However, a lot of pear software relies on one or many libraries from pear.php.net – it’s a tough situation. A lot of software projects have moved on to composer, an alternative solution to dependency distribution. However, some composer projects have dependency on PEAR channels.
I am currently submitting some changes to Horde upstream to make Horde libs (both released and from git) more usable from composer projects.
Short-term goal is making use of some highlight libraries easier in other contexts. For example, Horde_ActiveSync and Horde_Mail, Horde_Smtp, Horde_Imap_Client are really shiny. I use Horde_Date so much I even introduced it in some non-horde software – even though most functionality is also somewhere in php native classes.
The ultimate goal however is to enable horde groupware installations out of composer. This requires more work to be done. There are several issues.
- The db migration tool checks for some pear path settings during runtime https://github.com/horde/Core/pull/2 Most likely there are other code paths which need to be addressed.
- Horde Libraries should not be web readable but horde apps should be in a web accessible structure. Traditionally, they are installed below the base application (“horde dir”) but they can also be installed to separate dirs.
- Some libraries like Horde_Core contain files like javascript packages which need to be moved or linked to a location inside another package. Traditionally, this is handled either by the “git-tools” tool linking the code directory to a separate web directory or by pear placing various parts of the package to different root paths. Composer doesn’t have that out of the box.
Horde already has been generating composer manifest files for quite a while. Unfortunately, they were thin wrappers around the existing pear channel. The original generator even took all package information from the pear manifest file (package.xml) and converted it. Which means, it relied on a working pear installation. I wrote an alternative implementation which directly converts from .horde.yml to composer.json – Calling the packages by their composer-native names. As horde packages have not been released on packagist yet, the composer manifest also includes repository links to the relevant git repository. This should later be disabled for releases and only turned on in master/head scenarios. Releases should be pulled from packagist authority, which is much faster and less reliant on existing repository layouts. https://github.com/horde/components/pull/3
To address the open points, composer needs to be amended. I currently generate the manifests using package types “horde-library” and “horde-application” – I also added a package type “horde-theme” for which no precedent exists yet. Composer doesn’t understand these types unless one adds an installer plugin https://github.com/maintaina-com/installers. Once completed and accepted, this should be upstreamed into composer/installers. The plugin currently handles installing apps to appropriate places rather than /vendor/ – however, I think we should avoid having a super-special case “horde-base” and default to installing apps directly below the project dir. Horde base should also live on the same hierarchy. This needs some additional tools and autoconfiguration to make it convenient. Still much way to go.
That said, I don’t think pear support should be dropped anytime soon. It’s the most sensible way for distribution packaging php software. As long as we can bear the cost involved in keeping it up, we should try.
Rambox | Chat Message Unification Application for openSUSE
Coherent Color Scheme Creation for Qt and GTK on openSUSE
Raspberry Pi: "Bluetooth: hci0 link tx timeout"
Bluetooth: hci0 link tx timeoutThis happened with different USB bluetooth dongles. Googling the problem found mostly unrelated articles, or advice that was obviously plain wrong.
Bluetooth: hci0 killing stalled connection xx:xx:xx:xx:xx:xx
Long story short: moving the dongle to a powered USB hub solved the issue.
(Just for the record: the raspi is powered by a good power supply...)
Parrot Security OS | Review from an openSUSE User
openSUSE Tumbleweed Community Challenge
wlc 1.0
wlc 1.0, a command line utility for Weblate, has been just released. The most important change is marking this stable and releasing actual 1.0. It has been around long enough to indicate it's stability.
Full list of changes:
- Marked as stable release.
- Added support for more parameters on file upload.
wlc is built on top of Weblate API, you can use it on Weblate 2.10 or newer, though some features might require more recent version. Of course you use it on our hosting offering. Usage examples can be found in the wlc documentation.
Governance on demand
Why that?
Nadia suggests a theory in the last footnote of her post, "that projects only need to define governance at the first sign of conflict". Intuitively, this makes immediate sense. We have all seen the projects which seem to work very fine without any thoughts about governance, and we also have seen those projects where attempts to set up formal governance have brought things to a halt instead of serving the project. So doing it at the last responsible point in time, when you actually need it, sounds like a very attractive model.
Being able to add governance on demand needs a high level of awareness and reflection. It also needs a culture which is open to the idea of governance, has the means to facilitate discussions about it, and is able to come to a conclusion. It is the point where you have to "decide to decide".
This is not easy, especially in the context of a conflict. It can be paralysing. Making decisions without having defined structures, without having precedence, takes responsibility and courage. Maybe not everybody will go along with it. You don't know because you haven't done it before.
One model which seems to be a quite natural outcome of such a "we need governance, now" situation, is the "benevolent dictator". When conflict arises, the founder or another exposed person steps in and takes a decision. This sets a trajectory for the project, which might be right or not. It depends on the project, on the people, on the environment.
Another model which comes naturally is to follow the "those who do the work decide" principle. This adds local, high context governance. It has to be underpinned by common values and a common sense of direction, though. Otherwise it will fail to solve the kind of conflicts where active people seem to stand against each other.
If you have a strong culture, it might appear you don't need governance. If you have shared values, if you have a common mission, if people learn by imitating healthy behavior from others, then it's easy to take decisions and to preempt conflicts. This could also be called a state of implicit governance, because it is there, but it's not formulated.
If you have a strong culture, then you are also prepared to add governance on demand. This can become necessary because of growth, a changing environment, or other factors which can't be addressed by existing intuition.
From this point of view: Build culture first and governance will follow.
These are my thoughts. I would be more than happy to hear about your thoughts as well.
Foot note: In some way "governance on demand" is not a governance model in itself, but more a meta model. It doesn't tell how the governance regarding the project then has to look like but only answers a part of the question how to get there. It is in the nature of governance models to also cover this meta level, though. Maybe "governance on demand" is more a governance element than a model in itself. It governs the evolution of governance models.
Node Script to Display Cookies
With some advice from StackOverflow, I wrote a short node script that I placed in the file $HOME/bin/get-cookies.js with the executive bit set via chmod +x $HOME/bin/get-cookies. It relies on the library puppeteer to control a browser instance of headless Chromium, which must be installed first via npm i puppeteer.
Then, you can call get-cookies.js https://google.com to get all installed cookies upon request of the page given as a parameter (here: https://google.com). Note that Puppeteer creates its own Chromium user profile which it cleans up on every run.
Source Code of ‘get-cookies.js’
#!/usr/bin/env node
const puppeteer = require('puppeteer');
const url = process.argv[2];
(async () => {
const browser = await puppeteer.launch({ headless: true, args: ['--disable-dev-shm-usage'] });
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
var cookies = await page._client.send('Network.getAllCookies');
cookies = cookies.cookies.map( cookie => {
cookie.expiresUTC = new Date(cookie.expires * 1000);
return cookie;
});
var persistantCookies = cookies.filter(c => {
return !c.session;
});
console.log({
persistantCookies: persistantCookies,
persistantCookiesCount: persistantCookies.length,
});
} catch(error) {
console.error(error);
} finally {
await browser.close();
}
})();