Classifier

Text Analysis currently offers a Naive Bayes Classifier for text classification.

To load the Naive Bayes Classifier, use the following command -

using TextAnalysis: NaiveBayesClassifier, fit!, predict

Basic Usage

Its usage can be done in the following 3 steps.

1- Create an instance of the Naive Bayes Classifier model -

TextAnalysis.NaiveBayesClassifierType
NaiveBayesClassifier([dict, ]classes)

A Naive Bayes Classifier for classifying documents.

It takes two arguments:

  • classes: An array of possible classes that the concerned data could belong to.
  • dict:(Optional Argument) An Array of possible tokens (words). This is automatically updated if a new token is detected in the Step 2) or 3)

Example

julia> using TextAnalysis: NaiveBayesClassifier, fit!, predict

julia> m = NaiveBayesClassifier([:spam, :non_spam])
NaiveBayesClassifier{Symbol}(String[], [:spam, :non_spam], Matrix{Int64}(undef, 0, 2))

julia> fit!(m, "this is spam", :spam)
NaiveBayesClassifier{Symbol}(["this", "is", "spam"], [:spam, :non_spam], [2 1; 2 1; 2 1])

julia> fit!(m, "this is not spam", :non_spam)
NaiveBayesClassifier{Symbol}(["this", "is", "spam", "not"], [:spam, :non_spam], [2 2; 2 2; 2 2; 1 2])

julia> predict(m, "is this a spam")
Dict{Symbol, Float64} with 2 entries:
  :spam     => 0.59883
  :non_spam => 0.40117
source

2- Fitting the model weights on input -

TextAnalysis.fit!Function
fit!(model::NaiveBayesClassifier, str, class)
fit!(model::NaiveBayesClassifier, ::Features, class)
fit!(model::NaiveBayesClassifier, ::StringDocument, class)

Fit the weights for the model on the input data.

source

3- Predicting for the input case -

TextAnalysis.predictFunction
predict(::NaiveBayesClassifier, str)
predict(::NaiveBayesClassifier, ::Features)
predict(::NaiveBayesClassifier, ::StringDocument)

Predict probabilities for each class on the input Features or String.

source

Example

julia> using TextAnalysis
julia> m = NaiveBayesClassifier([:legal, :financial])NaiveBayesClassifier{Symbol}(String[], [:legal, :financial], Matrix{Int64}(undef, 0, 2))
julia> fit!(m, "this is financial doc", :financial)NaiveBayesClassifier{Symbol}(["financial", "this", "is", "doc"], [:legal, :financial], [1 2; 1 2; 1 2; 1 2])
julia> fit!(m, "this is legal doc", :legal)NaiveBayesClassifier{Symbol}(["financial", "this", "is", "doc", "legal"], [:legal, :financial], [1 2; 2 2; … ; 2 2; 2 1])
julia> predict(m, "this should be predicted as a legal document")Dict{Symbol, Float64} with 2 entries: :legal => 0.666667 :financial => 0.333333