Marine Morales

Popularizing Data. Empowering Analysts. Elevating Insights.

Menu
  • Home
  • Data Analytics
  • Data Storytelling
  • Fostering Success
  • Creative Corner
  • About
Menu

Data Science Tech Stack Series: Working Environments

Posted on August 15, 2023February 12, 2025 by Marine Morales

When performing Data Science, it is crucial to choose the right working environment. Our goal is to optimize our workflow efficiency during the various stages of data wrangling, coding, collaboration, and results interpretation. In this article, we will explore three types of environments with a progressive set of features: code editors (1), integrated development environments (2), and interactive computing environments (3). We will also introduce the most popular collaboration platform and tool (4) and we will conclude with the very useful package distribution softwares (5) that enable us to download and install in one go all we need to do Data Science.

1. Code Editors

Code editors are software tools used for writing and editing code. They are lightweight and offer a simple and streamlined workflow for smaller-scale coding tasks. They typically offer basic features such as syntax highlighting, auto-indentation, code completion, code snippets, code folding and version control. Here are some of the best code editors available today that support Python, SQL and R:

Code EditorsNotepad ++Sublime TextEmacs
FreeYesNoYes
Open-sourceNoNoYes
DeveloperWindowsJon SkinnerCommunity-owned
Primary usersSoftware DevelopersSoftware DevelopersSoftware Developers,
Data Scientists,
Researchers
Main FeaturePowerfulCustomizableNatural language processing,
Computational linguistics
Cross-platformNoYesYes
Git integration*NoYesNo
DebuggerYesNo, you’ll need to use
Third-party pluggins.
Yes
*cf. Section 4. Version Control & Collaboration Tools

2. Integrated Development Environments (IDE)

Integrated Development Environments are more comprehensive software tools than code editors with additional functionalities and integration with other tools. As a result, they are more heavyweight than code editors but they improve productivity and complexity management for medium scale development projects. IDE typically include all the features of a code editor with others tools such as build automation, code refactoring and project management. They are obviously cross-platform and mandatorily provide integrations with version control systems like Git (cf. Section 4. Version Control & Collaboration Tools), which allows users to manage and collaborate on their code and projects with others. Here are some of the best IDEs available today that support Python, SQL and R:

IDEsVisual StudioPyCharmSpyderRStudio
FreeYesYes only for the Community EditionYesYes
Open-sourceYesYes only for the Community EditionYesYes
DeveloperMicrosoftJetBrainsMITRStudio, Inc.
Primary usersMicrosoft Software DevelopersPython Software DevelopersPython Data ScientistsData Scientists & Statisticians
Main FeatureAzure compatibleSpecialised in PythonScientific environment for PythonStrong statistician community

If you happen to work for big tech companies other than Microsoft, you will be asked to use their respective proprietary IDEs: AWS Cloud9 at Amazon, Xcode at Apple and Google Cloud Shell at Google.

3. Interactive Computing Environments

In comparison to the previously mentioned IDE and code editor, an Interactive Computing Environment is a more comprehensive working environment. An Interactive Computing Environment primary focus is, as its name stands for, on providing an environment where we can interactively work with our data, intuitively code and flexibly create and share documents that combine live code, narrative text, visualizations, and other multimedia elements. It is typically used for medium scale data processing projects including exploratory data analysis, data visualization, and rapid prototyping of machine learning models. Examples of popular web-based Interactive computing environments that support Python, SQL and R include:

ICEsJupyter NotebookJupyterLabApache ZeppelinMATLAB Online
FreeYesYesYesNo
Open-sourceYesYesYesNo
Cloud-basedNoNoNoYes
DeveloperFernando Pérez & Brian GrangerFernando Pérez & Brian GrangerMoon Soo Lee, Sungwook Yoon & Hyungtae Kim under the Apache LicenseMathWorks
Primary usersSoftware Developers & Data ScientistsSoftware Developers & Data ScientistsData ScientistsData Scientists, Engineers,
Researchers in Maths, Physics, Finance & Biology
Main FeatureEase of use, versatilityNext generation of Jupyter Notebook, more powerful & more extensionsSpecifically designed for data analysis, highly extensible & customizableIncludes Simulink to graphically model and simulate dynamic systems

If you happen to work for big tech companies, you will be asked to use their respective proprietary Interactive computing environments: Azure Notebooks at Microsoft and Google Colaboratory at Google. They will all connect to Jupyter Notebooks to offer their Cloud version. Their proprietary versions will provide access to GPUs and TPUs to accelerate model training & computation.

The above Interactive Computing Environments are primarily designed for individual use and small-scale collaborations. They tend to have no to limited built-ins for real-time collaboration and multi-user editing. This means that we can share our notebooks on the platform servers or cloud to ask for comments and suggestions but we cannot all collaborate on the same notebook. Hence it is better to connect our environment to Version Control & Collaboration Tools extensions like GitHub.

4. Collaboration Platforms and Tools

We will break down the collaboration platforms and tools into version control dedicated tools, collaboration and hosting dedicated tools and Artificial Intelligence (AI) dedicated tools.

Version control systems. The purpose of a version control tools is to allow multiple people to work on the same codebase without conflicting with each other them and with a precise tracking of the changes made to the code over time. The most popular version control software is Git. Git is a free web-based distributed version control system. Git is mostly used by software developers who need to collaborate on a programming project. With Git, developers can make changes to code on their local machine and then push those changes to a central repository, where they can be shared with other members of the development team.

Collaboration and hosting systems. The purpose of a collaboration tools is to facilitate communication, issue tracking, quality control and accountability of the stakeholders involved in the program being built. Connected to Git introduced earlier, GitHub provides a centralized hosting platform service for developers to store and share their Git repositories, making it even easier to create repositories, contribute to open source projects, and collaborate with other developers on code development and maintenance. You can create an account for free. As a data scientist, I also use GitHub to post my work and share it with a broader community. Posting my Jupyter notebooks for instance helps be build my professional profile and showcase my skills to potential employers or collaborators. Though we can also access GitLab or proprietary tools like AWS CodeCommit.

Artificial Intelligence (AI) Collaboration Platforms. These platforms enable teams of data scientists, engineers, and other stakeholders to work together more efficiently and effectively on complex AI projects. They provide a range of tools and services for building, customizing, training, deploying and experimenting with machine learning models. The platforms also provide support for data management and collaboration along with including Jupyter Notebook and Jupyter Lab integrations. Some examples of popular AI collaboration platforms include:

AI PlatformsAmazon SageMakerGoogle Cloud AI PlatformIBM Watson StudioAzure Machine Learning StudioDatabricks
FreeOnly up to 250hOnly up to 120hOnly the Lite plan (25GB)Only up to 4h per monthOnly the Community Edition
Open-sourceNoNoNoNoNo
Cloud-basedYesYesYesYesYes
DeveloperAmazonGoogleIBMMicrosoftApache Spark

5. Package Distribution Softwares

A Package Distribution Software enables us to download and install in one go all the following related softwares: programming language, key librairies & frameworks packages, version control systems, working environements, etc. It also allows us to easily manage these software packages and dependencies and avoid conflicts between different versions of packages along your Data Science journey to voiding. Below are some of the most popular Package Distribution Softwares that support Python, SQL and R:

Python DistributionR Distribution
Anaconda: popular open-source choice for data science and scientific computingRStudio Desktop: popular open-source choice particularly for data science and statistical analysis
Enthought Canopy: another popular open-source distributionRevolution R Open: preferred open-source distribution performance optimizations
ActivePython: commercial enterprise distributionMicrosoft R Open (MRO): commercial distribution for performance optimizations

Explore more

Check my post that introduces the full stack: Baking Up The Ultimate Data Science Tech Stack

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Three Frameworks That Make Your Analysis Aim, Hit, and Trigger Action

    Three Frameworks That Make Your Analysis Aim, Hit, and Trigger Action

    March 16, 2025
  • My Essential Project Planning Shopping Cart

    My Essential Project Planning Shopping Cart

    February 22, 2025
  • Crack your Case Like an FBI Analyst: Secure the Win and Lock it Down

    Crack your Case Like an FBI Analyst: Secure the Win and Lock it Down

    January 13, 2025
  • Mary Anning, the 19th-Century Paleontologist Who Teaches us about Modern Analytics

    Mary Anning, the 19th-Century Paleontologist Who Teaches us about Modern Analytics

    December 21, 2024
  • Crack your Case Like an FBI Analyst: Turn your Evidence into a Compelling Narrative

    Crack your Case Like an FBI Analyst: Turn your Evidence into a Compelling Narrative

    November 2, 2024
FOLLOW ME
  • GitHub
  • LinkedIn
  • Twitter

ABOUT ME

Welcome to my little corner of the internet where we explore the wonderful world of Data Science and uncover hidden insights together. My name is Marine and I am a Data and Business Intelligence Analyst specialized in optimizing Marketing and Sales performances.

  • GitHub
  • LinkedIn
  • Twitter

Recent Posts

  • Three Frameworks That Make Your Analysis Aim, Hit, and Trigger Action

    Three Frameworks That Make Your Analysis Aim, Hit, and Trigger Action

    March 16, 2025
  • My Essential Project Planning Shopping Cart

    My Essential Project Planning Shopping Cart

    February 22, 2025
  • Crack your Case Like an FBI Analyst: Secure the Win and Lock it Down

    Crack your Case Like an FBI Analyst: Secure the Win and Lock it Down

    January 13, 2025

Topics

  • Creative Corner (1)
  • Data Analytics (14)
  • Data Storytelling (6)
  • Fostering Success (15)
©2023 Marine Morales
Menu
  • Home
  • Data Analytics
  • Data Storytelling
  • Fostering Success
  • Creative Corner
  • About