Deploying Models

Duration: 7 min

This module covers the essential steps and best practices for deploying machine learning models into production environments. It is crucial to understand how to effectively deploy models to ensure they are accessible, scalable, and maintainable for end-users.

Model Serialization

Model serialization is the process of converting a trained machine learning model into a format that can be saved to disk and later loaded for inference. TensorFlow provides several methods for serializing models, including saving the entire model or just the weights. Serialized models can be deployed to various environments, such as web servers, mobile applications, or edge devices.

import tensorflow as tf

# Assume 'model' is a trained Keras model
model.save('my_model.h5')  # Save the entire model

# Load the model
loaded_model = tf.keras.models.load_model('my_model.h5')

# Verify the loaded model
print(loaded_model.summary())

Try it in Google Colab:

Model: "sequential"
__________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                640       
__________________________________________________________________
dense_1 (Dense)              (None, 10)                650       
=================================================================
Total params: 1,290
Trainable params: 1,290
Non-trainable params: 0
__________________________________________________________________

Serving Models with TensorFlow Serving

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It allows you to deploy models via REST or gRPC APIs, making it easy to integrate with various applications. To use TensorFlow Serving, you need to export your model in the SavedModel format and then start a TensorFlow Serving instance.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a simple model
model = Sequential([
    Dense(64, activation='relu', input_shape=(32,)),
    Dense(10, activation='softmax')
])

# Save the model in SavedModel format
model.save('saved_model')

# To serve the model, run the following command in terminal:
# tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path=saved_model/

# Example REST API request
import requests

data = {'instances': [[0.1, 0.2, 0.3] * 10]}  # Example input data
response = requests.post('http://localhost:8501/v1/models/my_model:predict', json=data)
print(response.json())

💡 Tip: Ensure that the input data format matches the expected input shape of your model when making predictions via TensorFlow Serving.

❓ Which method is used to save an entire TensorFlow/Keras model to disk?

model.to_json() model.save_weights('my_model.h5') model.save('my_model.h5') model.export('my_model.h5')

❓ What command is used to start a TensorFlow Serving instance for a saved model?

tensorflow_model_server --model_name=my_model --model_base_path=saved_model/ tensorflow_serve --model=saved_model/ tensorflow_start --model_path=saved_model/ tensorflow_deploy --model=saved_model/

Deploying Models

Model Serialization

Serving Models with TensorFlow Serving

Related Courses