A cost effective way to add advanced statistical methods and Machine Learning to your Power BI reports is through the use of Python scripts. The method that we recommend is outlined as follows:

  1. Export the data you want to train or create your model on
  2. Outside of Power BI, create and train your model
  3. Export the completed model
  4. In your Power BI Report, load the model and make predictions
  5. Merge predictions with the underlying data

Our goal is to provide starting points and guidelines for each of the most useful models you can use.

Linear Regression

Steps 1 & 2 are straight-forward excercises in using SKlearn or other libraries in Python. Make sure to export the completed model into a joblib file. An example of a model on a public dataset vgsales is as follows:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn import preprocessing
from joblib import dump, load

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

dataset = pd.read_csv(".../Iowa Liquor Sales/Data for Modelling.csv")

X = dataset[['Store Number', 'Month', 'Prior 6 Month Sales',
       'Prior Year Sales']]
y = dataset['Sale']

le = preprocessing.LabelEncoder()
X['Month']=le.fit_transform(X['Month'])
X['Store Number']=le.fit_transform(X['Store Number'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

regressor = LinearRegression()
regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(df)

dump(regressor, 'storesalesmodel.joblib')

This produces a trained model that is exported to your folder. Within Power BI, create a table containing the values you need predictions made to. Then run a predictions and merge script on top of that data:

# 'dataset' holds the input data for this script
import pandas as pd
from pandas.core.frame import DataFrame # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn import preprocessing
from joblib import load

scaler = StandardScaler()

dataset = pd.read_csv(".../Iowa Liquor Sales/Prediction Table.csv")
regr = load(".../Iowa Liquor Sales/storesalesmodelnew.joblib")
le = preprocessing.LabelEncoder()

dataset = dataset[['Store Number', 'Month', 'Prior 6 Month Sales',
       'Prior Year Sales']]

dataset['Month']=le.fit_transform(dataset['Month'])
dataset['Store Number']=le.fit_transform(dataset['Store Number'])

pred_class = regr.predict(dataset)
dataset['prediction'] = pred_class

pd.merge(dataset,dataset['Predicted Sales'].dropna() .to_frame(),how = 'left',left_index = True,   right_index = True)

Now your table has a new column with the predicted values.