Module 10 of 25 · Ensemble Learning — Bagging, Boosting, XGBoost, LightGBM, CatBoost, Stacking, Voting · Intermediate

LightGBM: Introduction and Setup

Duration: 5 min

This module provides an introduction to LightGBM, a high-performance, distributed gradient boosting framework. You will learn why LightGBM is essential for efficient and effective machine learning, especially for large datasets. We will cover the installation process, basic setup, and run a simple example to get you started.

Introduction to LightGBM

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It is designed to be distributed and efficient with the following advantages: faster training speed, lower memory usage, better accuracy, support of parallel and GPU learning, and capability of handling large-scale data.

import lightgbm as lgb

# Check if LightGBM is installed
print(lgb.__version__)

Try it in Google Colab: Open in Colab

3.3.2

Setting Up LightGBM

To set up LightGBM, you need to install it using pip. Once installed, you can import it into your Python environment and start using it for machine learning tasks. Below is an example of how to set up and run a simple LightGBM model.

import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create Dataset for LightGBM
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)

# Specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective':'multiclass',
    'num_class': 3,
    'metric':'multi_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
   'verbosity': -1
}

# Train
gbm = lgb.train(params,
                train_data,
                num_boost_round=20,
                valid_sets=test_data,
                early_stopping_rounds=5)

# Save model to file
gbm.save_model('model.txt')

# Predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
print('Prediction shape:', y_pred.shape)
Prediction shape: (50, 3)

💡 Tip: Always ensure that your dataset is properly preprocessed before feeding it into LightGBM. Missing values and categorical features need special handling to achieve optimal performance.

❓ What is one of the main advantages of using LightGBM?

❓ Which parameter in LightGBM controls the number of leaves in a tree?

← Previous Continue interactively → Next →

Related Courses