Module 2 of 11 · AWS SageMaker — End-to-End ML Platform · Intermediate

SageMaker Studio & Notebooks

Duration: 50 min

SageMaker Studio is the integrated development environment for ML workflows. This module covers Studio setup, JupyterLab interface, kernel management, and instance type selection for different workloads.

SageMaker Studio Overview

Studio provides a unified web interface with built-in Git integration, experiment tracking, and model registry access. It eliminates the need to manage separate Jupyter servers and provides seamless access to SageMaker services.

Creating a Studio Domain

# Create a SageMaker Studio domain
aws sagemaker create-domain \
  --domain-name ml-studio \
  --auth-mode IAM \
  --default-user-settings \
    ExecutionRole=arn:aws:iam::123456789012:role/SageMakerRole \
  --region us-east-1

Launching Studio and User Profiles

import boto3

sagemaker_client = boto3.client('sagemaker', region_name='us-east-1')

# Create user profile
response = sagemaker_client.create_user_profile(
    DomainId='d-xxxxx',
    UserProfileName='data-scientist-1',
    UserSettings={
        'ExecutionRole': 'arn:aws:iam::123456789012:role/SageMakerRole',
        'SharingSettings': {
            'NotebookOutputOption': 'Allowed',
            'S3OutputPath': 's3://my-bucket/studio-output'
        }
    }
)

print(f"User profile created: {response['UserProfileArn']}")

JupyterLab Interface

Studio uses JupyterLab as the notebook interface. The left sidebar shows file browser, running kernels, and extensions. The main editor supports multiple notebooks, terminals, and text editors simultaneously.

# Access SageMaker session from Studio notebook
import sagemaker
from sagemaker import get_execution_role

session = sagemaker.Session()
role = get_execution_role()
bucket = session.default_bucket()

# List available notebooks
notebooks = session.list_notebook_instances()
print(f"Available notebooks: {len(notebooks['NotebookInstances'])}")

Kernel Management

Studio supports multiple kernel types: Python 3, Python 2, R, and custom kernels. Each kernel runs in an isolated environment with its own dependencies.

# Install packages in current kernel
pip install sagemaker boto3 pandas scikit-learn

# Check installed packages
pip list | grep sagemaker

Instance Types for Studio

{
  "instance_types": {
    "ml.t3.medium": {
      "use_case": "Development, light workloads",
      "cpu": 1,
      "memory_gb": 4,
      "cost_per_hour": 0.05
    },
    "ml.m5.large": {
      "use_case": "General purpose, data exploration",
      "cpu": 2,
      "memory_gb": 8,
      "cost_per_hour": 0.10
    },
    "ml.p3.2xlarge": {
      "use_case": "GPU-intensive, deep learning",
      "cpu": 8,
      "memory_gb": 61,
      "gpu": 1,
      "cost_per_hour": 3.06
    }
  }
}

Working with Notebooks in Studio

# Create and run a simple notebook
import pandas as pd
import numpy as np

# Load data
data = pd.read_csv('s3://my-bucket/data.csv')
print(f"Data shape: {data.shape}")

# Basic exploration
print(data.describe())
print(data.isnull().sum())

# Save processed data
data.to_csv('s3://my-bucket/processed_data.csv', index=False)

Git Integration

# Clone a repository in Studio terminal
git clone https://github.com/aws/amazon-sagemaker-examples.git

# Commit changes
git add .
git commit -m "Update notebook"
git push origin main

Quiz 1

❓ What is SageMaker Studio?

Quiz 2

❓ Which notebook interface does Studio use?

Quiz 3

❓ Which instance type is best for GPU-intensive deep learning?

Quiz 4

❓ How do you install packages in a Studio kernel?

Quiz 5

❓ What is the primary benefit of Studio's Git integration?

← Previous Continue interactively → Next →

Related Courses