Futurism logo

Gigantum: Decentralised Data Science

Build it. Move It. Share It.

By Marina T AlamanouPublished 3 years ago 6 min read
Like
Photo by Jack Dong on Unsplash (https://unsplash.com/photos/YRoJea0xGVY)

Gigantum: a web application for better collaboration while making reproducible research easier

The scientific community has a big problem, and its called reproducibility crisis. Whether you are a biomedical scientist or a computer scientist, eventually you will come to understand that the biggest challenge you will face in your carreer is to: "Reproduce the results from your own scientific paper".

What I mean?

Not to get too technical, scientists every single day and everywhere in the world face the same problem: "they can't reproduce the results from their own (not to mention others people's) scientific paper, even though they are 100% sure they repeated exactly what they did during their initial study!". And this is true when you are writing a code or when your are doing wet lab experiments. And it gets even more complicated now that wet lab and code are merging, and the boundaries between traditional biotech and tech are being crossed and fused.

But how is that even possible you might ask?

Well, imagine the following scenario.

You are about to travel and you have just made a list of all the things you want to put in your suitcase and then you prepared your suitcase. Then you double-checked you took everything and afterwards you asked your friend to confirm that everything was inside the suitcase, only to find out eventually (when arrived at your final destination) that what you thought as a little jar full of coconut oil inside your suitcase is now just empty. And you can't stop asking yourself: "where is the coconut oil, I was 100% sure it was in the jar inside my suitcase?"... And then after days of thinking, you realised that the weather was different at your travel destination, so the coconut oil melted and ended up inside your running shoe and adsorbed. But the damage was done because you were in the desert and you couldn't buy new coconut oil.

As a matter of fact, scientists all over the world face every day bigger (bigger than bigger, huger) challenges than the coconut oil, that cost millions of dollars of taxpayers' money. And stakeholders such as universities, funding agencies, industries, the research community as well as the public are increasingly worried, since money and time are spent trying to understand where the coconut oil ended up, instead of creating tools that will allow scientists to completely "synchronise" their work...Even when one group is working in Antarctica and the other group is working in the Sahara desert.

Of course, the solution to all these problems comes from Gigantum 🚀🖥💻📲🕸☁️ (@gigantumscience) (founded 2016, Washington DC Metro Area, US) a developer of a data science platform that intends to make data science more open, transparent and reproducible, especially right now that our world is AI and ML driven.

In fact, as more and more machine learning models are being used in the real world, data scientists need more efficient tools for managing their work environments and for tracking code.

The previous solution to this problem was to use containers (a data structure to store objects in an organised way that follows specific access rules) like Kubernetes, but since you can't include a random workstation in Kubernetes, Gigantum had to re-think the architecture of the containers.

In particular, many cloud-native platforms (data science platforms) can’t run across bare metal, private cloud and public clouds. Attempts to adapt them to on-premises contexts and hybrid infrastructures typically don’t go well, and most data science teams are left to their own devices to find solutions.

For this reason, the company decided to build its system atop Docker, which is the standard for containers, but instead of relying on Kubernetes to handle the movement of data science workloads, Gigantum’s approach was based on the model introduced by Git (a software for tracking changes in any set of files). Meaning, like GitHub and GitLab, Gigantum allows data scientists to work locally on their choice of machines, and then reach out to external resources as needed. So, Gigantum is a Git-like tool built for data science. 

In other words, they have a little cloud platform that runs in Docker, right in your laptop, and they have this remote model for moving work back and forth, between different people, working on different machines, but instead of just handling versioning of files, they handle centralisation of workflows.

Gigantum is an MIT-licensed open-source platform for Python and R coders (with possibly more languages to come). In particular, Gigantum can be run anywhere (laptop, GPU, premises infrastructures, public and private clouds) – eliminating problems that can arise when collaborating via multiple infrastructures. In addition, Gigantum is not purely SaaS dependent and thus is potentially cheaper.

Moreover, Gigantum help scientists automate (automation) things like: Git versioning, best practices, environment configuration, transfer and interacting with GPUs. Meaning, all the things that data scientists have to do all the time, and eat 60% to 80% of their time.

The intended audience is anyone authoring data analysis, machine learning (on GPU’s too) as well as those that need to see, use or manipulate the work. This means you can work, compute, share and collaborate across any collection of machines you want.

In the end, Gigantum resolves the three biggest technical challenges of doing data science:

  1. "How to customise machines?",
  2. "How to share work with colleagues?, and
  3. "How to move work across machines?,

without having to centralise everything.

Decentralisation for them means: the Git model, i.e. self-contained deployment on single machines combined with backup and transfer via a remote service. That requires the "local" software to handle development and computational tasks while the "remote" service handles storage and backup of integrated materials, i.e. code, data and environments.

Additionally, Gigantum has just joined the GigaScience reproducibility toolkit. GigaScience is a peer-reviewed scientific journal that was established in 2012 and covers research and large data-sets that result from work in life sciences. GigaScience has always had a focus on reproducibility rather than subjective impact. However, since they experienced some difficulties in carrying out reproducibility case studies they just started using the Gigantum workbench that can be run anywhere – eliminating problems that can arise when collaborating via multiple infrastructures and contexts.

The Gigantum Team is a hub of masters and PhDs in mathematics and computer science and the founders are: J. Tyler Whitehouse PhD 🎓 Co-Founder, is also the 🕶 CEO apart President and Board Member; Dean Kleissas Co-Founder, is the 🖥 CTO and Board Member; Joshua Vogelstein PhD 🎓, Co-Founder and 📚Advisor; R. Jacob Vogelstein PhD 🎓, Co-Founder and 📚Advisor and Randal Burns, PhD 🎓, Co-Founder and 📚Advisor.

The company had three investors 💰💵 so far: Digital Science Accelerator/Incubator, Defense Advanced Research Projects Agency and United States Department of Defense. And these PhD guys have been developing their platform for the last couple of years and currently they have around 400 users working on their platform, while they are about to pop above the radar and begin their sales process. Their primary targets will be healthcare companies and banks.

Finally, in the following video you can watch an intro to how Gigantum makes it easy to work on different machines and clouds:

And if you "Just can't seem to get enough" of Gigantum, then Kenneth Sanford, PhD, Gigantum's Strategic advisor - Go to market, has something more to say:

Thank you for reading (and watching) 👓💙

And if you liked this post why not share it?

@MetaphysicalCells

#science #datascience #drugdiscovery #drugdevelopment #AI #biotechAI #cloud #cloudcomputing 🚀🖥💻📲🕸☁️

👉🏻🕵🏻 References

Gigantum Blog

Is Kubernetes Really Necessary for Data Science?

Gigantum Joins the Giga Reproducibility Toolkit

artificial intelligence
Like

About the Creator

Marina T Alamanou

Life Science Consultant #metaphysicalcells

MetaphysicalCells

Twitter

Facebook

Behance

Minds

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.