Data Sharing Platform

Martin · 17 January 2025 14:50

The Scottish Tech Army have kindly offered to try to set up a Tech for Good Alliance project to help develop our Data Sharing Platform. We’ve produced a rough scope here which we need to finalise by 7 February for them - as this is when we’ll know if our Launchpad bid with Firefinch was succesfull. Please feel free to make any comments on the scope, either here or in the repo.

Amy · 23 January 2025 17:10

Have taken a read through the scope! Think it looks great, especially as this seems to be a template that STA have provided - it doesn’t seem like loads of detail is necessary up front? My comments below (happy to open a fork/PR if more useful).

I tried to open the link to FireFinch - could just be my browser, but I couldn’t view the site because it was insecure - is there an alternative link?

Appreciate that some of the below might overlap with the proposal you’ve already put to them, so if that’s the case feel free to ignore.

I think that the Scope could benefit from a section on Definitions and a section on Tools.

Under ‘Definitions’, it would be helpful to make explicit what is meant by a couple of terms used throughout the document - for instance, ‘Relevant data’ is cited to be sent to the DSP - is this all experiment data and metadata? Can users add further data or notes? Think it could also be useful to define ‘whether or not Pioreactors are being used’ and what level of use is being aimed at - e.g., is it sufficient for there to just be at least one experiment started every 6 months?

Under ‘Tools’, it would be helpful to specify whether or not there is an expected stack for the DSP. I assume there isn’t (apart from Python and pioreactor-compatible software), but perhaps it is a requirement that the stack is entirely open source and e.g. does not create lock in to proprietary databases? It’s also mentioned that there should be documentation for the whole build, which is great - is it also desirable for the whole platform to be open source and maintainable via GitHub for easy contribution?

I also think it could be helpful to split the Scope/ Goal ask by ‘Essential’ (required by end of quarter one) and ‘Nice to have’ features (could be things that further iterations/ others in the community pick up later). In the ‘Essential’ bucket, it sounds like a key requirement is for people to be able to view data (or maybe just download data?), a way to track whether the Pioreactors are being used, ability to download data in some format, has to be compatible with Pioreactor plugins and software. In the ‘Nice to have’ bucket, it seems like an API to call the data fits here, maybe the ability to analyse or visualise data on the platform itself in aggregate and real-time.

Something that could be helpful for the developers is keeping the types of user the platform is intended for in mind - this at least includes scientists, citizen scientists/ people just having a go, and companies/organisations, all of which might want to contribute and access existing data. Good to understand if there should be any difference in the experience of the platform between these (and other) types of users?

A general question - is there thought around how data governance for the platform is handled, particular in relation to whether or not there’s plans on storing or linking to personal information? I think a related point is on whether you want to track just the Pioreactor or also link this to users - this might also determine whether people will require a log-in or authentication for accessing the platform and data.

Also a thought on ways of working with the STA team - is the process that they take the scope away and then develop something and come back to Amybo - or will it be more iterative? If it’s the latter, it could be really good to see a proposed High Level Design for the platform first, before it’s created - this way, the scope becomes tighter as part of the process.

I hope these kinds of comments are useful, let me know either way, happy to modify the kinds of feedback that are helpful and not. Apologies if I’ve misinterpreted anything, still very new to all of the subject matter!

Best,
Amy

Amy · 23 January 2025 17:11

Also - I prompted GitHub Copilot to ‘Make suggestions for how the scope detailed in the README could be improved’ - it gave the following, they’re quite broad but have some overlap with the above, posting in case useful.

Expand Data Coverage: Include more detailed types of data that can be shared and specify how often it should be updated.

Clarify Software Requirements: Provide clear and specific requirements for the Pioreactor software, including dependencies and compatibility information.

Detail Data Security: Address data security and privacy concerns, specifying how data will be protected and managed.

Include User Guidelines: Add detailed guidelines for users on how to contribute data and use the platform effectively.

Define Success Metrics: Establish clear success metrics for the project, such as specific data collection targets or user engagement goals.

Outline Maintenance Plan: Provide a maintenance plan to ensure the platform remains operational and up-to-date.

Expand on Training: Offer more comprehensive training resources, including detailed documentation, video tutorials, and possibly webinars

Martin · 3 February 2025 14:39

If anyone is free to join a meeting this week so we can nail down the proposed DSP scope that would be amazing - please indicate your availability here: Calendly - Martin Currie