It has passed some time since I published a Naive_Bayes for text-classifier written in Ruby. At that time the API was designed very much influenced in way I have written a similar classifier in C++. However it was after a looked at the gem Decider that I realized how it was not taking advantage over the dynamic characteristics of Ruby to make Domain-Specific-Language APIs.
Static characteristics of my original Ruby classifier interface
The following are the methods that called my attention:Category declaration
Use of add_category method to declare new categories expects a category name and an array of strings as an example.classifier.add_category(name: :not_spam, training_set: [ 'Hi', 'Joe', 'how', 'are', 'you' ])
Input classification
The classify method makes use of an array of string as input for classification.category = classifier.classify(['Hi','James','are','you', 'going']) category.to_s # not_spam
Refactoring to new style
Category declaration through method missing and string
Instead of adding categories with a training set the user should only be required to provide examples to its desired with a dynamically created method which corresponds to a category name.classifier.not_spam << "Hi Joe, how are you"
Input classification from string
Instead of requiring the user to break the string in an array the entire string can be classified. The classifier should also take the responsibility of breaking the string in words.category = classifier.classify "Hi James,are you going ?" category.to_s # not_spam
No comments:
Post a Comment