Guess Band

In this post I will describe how to work with the fasttext text classifier.

Fasttext is a machine learning library for text classification. Let’s try to teach her to identify a metal band by the name of the song. For this, we use supervised learning using a dataset.

Let’s create a dataset of songs with group names:

__label__metallica the house jack built
__label__metallica fuel
__label__metallica escape
__label__black_sabbath gypsy
__label__black_sabbath snowblind
__label__black_sabbath am i going insane
__label__anthrax anthrax
__label__anthrax i'm alive
__label__anthrax antisocial

Training sample format:

{__label__class} {example from class} 

Let’s train fasttext and save the model:

model = fasttext.train_supervised ("train.txt")
model.save_model ("model.bin")

Let’s load the trained model and ask to identify the group by the song name:

model = fasttext.load_model ("model.bin")
predictResult = model.predict ("Bleed")
print (predictResult) 

As a result, we will get a list of classes that this example looks like, indicating the level of similarity by a number, in our case, the similarity of the Bleed song name to one of the dataset groups.
In order for the fasttext model to be able to work with a dataset that goes beyond the boundaries of the training sample, the autotune mode is used using a validation file (test file). During autotune, fasttext selects the optimal model hyperparameters, validating the result on a sample from the test file. The autotune time is limited by the user independently by passing the autotuneDuration argument.
An example of creating a model using a test file:

model = fasttext.train_supervised ("train.txt", autotuneValidationFile = "test.txt", autotuneDuration = 10000) 

Sources / tutorial / 2018/04/12 / fasttext-for-windows

Source code MachineLearning / – / tree / master / 6bandClassifier