We’ve talked about Newsday’s data inventory and how we conducted research to find a business use case for it. But all of this work has to live on a platform somewhere. And even though our platform is still undergoing active development, one of the benefits of adopting an open-source approach is that we can share our ongoing work — and even encourage others to participate in it.
This will be a slightly more technical look behind the scenes, but we hope it might provide others interested in building their own data platform a general idea of what’s possible. We plan on releasing a detailed guide when we launch our platform officially, but INMA members get an early preview here.
Newsday’s data platform at a glance
Our data platform currently consists of three Docker containers (Docker is a virtualisation software):
- A Django installation to handle logins and authentication.
- A Datasette installation that powers all our data work.
- And a “sidecar” that backs up our data.
Both Django and Datasette are open source tools/frameworks, which allow us to both tap into a wealth of existing plugins and communities as well as contributing to their growth. The sidecar is a scheduler that runs a shell script backing up our data on a consistent schedule.
Of the three, the real star of the show is Datasette.
Datasette in more detail
Datasette is an open-source data exploration and publishing tool built by Simon Willison, a developer with a history of building tools for newsrooms. It works fantastically in its current state but has some limitations as a multi-user Web application simply because that was not its initial priority.
Some features we take for granted in most Web platforms these days were missing. For example, there were no easy ways to change Datasette’s settings without restarting the application, adding data to the platform required command-line work, and all data on the platform was assumed to be public.
For Newsday, we needed some of those missing features and more. Thankfully, Datasette was designed to be expandable. With those concerns in mind, we worked with Brandon Roberts to develop and engineer a series of plugins we call our “Datasette Live” plugins. These plugins are free and open-source, and they address those concerns we mentioned earlier.
One plugin, however, deserves a special mention, and that is Datasette Live Permissions. By default, Datasette assumes all data on it is public. This presented a problem as we wanted our platform to host all of our data, much of which may not be ready for the public.
With this plugin however, we can create users and groups, and grant or restrict access to data as we see fit. This allows us to make some data public for our readers while keeping work-in-progress datasets secure.
It also means we can create teams within Datasette so reporters working on sensitive projects, such as our investigations team, can collaborate on their data work without worrying about the rest of the newsroom. At the same time, they can easily publish it once it’s ready.
And if we take the idea of groups and teams a step further, this plugin also opens up the potential for subscriber-only access to some of our data.
How does this all work together?
We recently discussed the business case for our data journalism. With our Datasette Live plugins, we now have a platform we can use to offer data through different membership tiers while also addressing a newsroom need. We can create dashboards, tools, visualisations, and more for an entirely different audience while leveraging the same data that powers our reporting for the public.
With all of our project’s different components coming together, we are another step closer to achieving our goal of building revenue, resiliency, and public service into a single product.