Maid-chan Messenger Bot¶
(Note: As of October 5, 2018, Maid-chan is now migrated to Python 3.6+)
Maid-chan name is inspired from Sakurasou’s Artificial Intelligence.
This documentation is intended for developers.
For non-developers who are interested in using Maid-chan features, please ask me directly and head to Maid-chan Facebook Page.
Maid-chan Overview¶
Maid-chan leverages the advantage of using Facebook Messenger API. It allows us to communicate and share our application with billion of people around the world. It also eliminates the necessity of developing mobile and web application.
Maid-chan itself is not a general-purpose bot. If you like Japanese culture or have an interest in learning Japanese, then Maid-chan will probably useful for you. Currently, it supports several basic features, such as simple chatbot, daily scheduler for Kanji & Vocabulary of the day, daily greetings, RSS Feed notifier, translation via Google Translate, and Tokyo train status notification via Yahoo Japan. In addition, Maid-chan can be used as an intermediary between mobile and server-side, where you could leverage server capability (CPU) for hard tasks such as Machine Learning (image processing).
Maid-chan is mainly written in Python.
Contents¶
How to Run¶
In order to run Maid-chan properly, you need to be able to use Facebook Messenger Bot.
Maid-chan Installation¶
- Clone the code from Github.
$ git clone https://github.com/freedomofkeima/messenger-maid-chan.git
- You need to fill Facebook token information in maidchan/config.py based on maidchan/config.py.example.
# Facebook token
ACCESS_TOKEN = "FACEBOOK_PAGE_ACCESS_TOKEN"
VERIFY_TOKEN = "WEBHOOK_VERIFY_TOKEN" # Set this value by yourself
# Optional (recipient_id)
ADMIN = [
"123...9"
]
# Chatbot storage
STORAGE_ADAPTER = "chatterbot.storage.MongoDatabaseAdapter"
IS_CHATTERBOT_ENABLED = True
ACCESS_TOKEN is received when you are creating an application on Facebook.
VERIFY_TOKEN should be your secret token between Facebook and your application.
ADMIN is recipient_id of application’s admin. The only way to get this value is from communicating directly with your application, since recipient_id is not the same with user’s Facebook ID and it differs between applications.
STORAGE_ADAPTER is used to store Chatterbot learning data. By default, it will use SQLite as its default storage adapter. See Chatterbot Documentation or Storage Adapter for better reference.
3. Maid-chan is using Redis as the database. Redis can be downloaded via https://redis.io/download. It’s preferable to run Redis as a background process in port 6379 (default port).
$ screen -R redis # use screen
$ ... # run Redis in background
4. ChatterBot 0.8.X has dropped support for simple JSON storage because of performance issues. Depending on your choice, you need to have either SQLite or MongoDB and modify maidchan/config.py accordingly. Alternatively, you can set IS_CHATTERBOT_ENABLED flag to False in maidchan/config.py to disable chatbot feature.
5. It is recommended to use virtualenv for running Python code. After that, you need to install all dependencies which are specified in requirements.txt.
$ python3 -m venv venv # create virtualenv
$ source venv/bin/activate # start virtualenv
$ pip install -r requirements.txt # install Maid-chan dependencies
- Finally, you can create Maid-chan executables by executing:
$ python setup.py install
$ maidchan # start Maid-chan main logic
Maid-chan Executables¶
Maid-chan has 3 executables (recommended to use screen):
- maidchan: Maid-chan main logic which is used to communicate with Facebook. It also handles user’s input processing, such as Chatbot with ChatterBot and Translate text via Google Translate.
- maidchan_primitive: CPU intensive task which processes image with Machine Learning and returns an abstract GIF (Image Processing with Primitive).
- maidchan_scheduler: Scheduler which is used to handle daily and repetitive tasks, such as Daily Offerings, Daily Japanese Lesson, RSS Feed Notifier, and Tokyo Train Status feat Yahoo Japan.
Features Overview¶
Features List¶
Currently, Maid-chan has the following features:
- Image Processing with Primitive: For every uploaded images to Maid-chan, Maid-chan will convert it to a geometric primitive GIF via Primitive.
- Chatbot with ChatterBot: For every text messages outside the provided available commands, Maid-chan will send a text response via ChatterBot and langdetect (to detect the language validity).
- Daily Offerings: If user decides to subscribe for offerings feature, Maid-chan will send a good morning and a good night message with one additional image from offerings/stock directory.
- Daily Japanese Lesson: If user decides to subscribe for Japanese lesson feature, Maid-chan will send a Kanji (N1-N4, by choice) and a Vocabulary for each day.
- RSS Feed Notifier: The idea is similar to rss-twilio-bot, where all subscribed RSS feeds are aggregated to Facebook Messenger via Maid-chan scheduler.
- Translate text via Google Translate: For every messages which have “translate” and its language derivatives in it, Maid-chan will use Google-translate to translate the given messages (with several default configuration).
- Tokyo Train Status feat Yahoo Japan: Yahoo Japan provides real-time information of all trains status in Tokyo. Instead of checking pages periodically, Maid-chan will send push notification via Messenger if there is a trouble with the monitored lines.
Currently, Maid-chan only supports Asia/Tokyo timezone (UTC +9).
Available Commands¶
All commands receive 2 parameters: redis_client as RedisDriver object and recipient_id as user’s identifier.
help is used to get the list of all available commands from Maid-chan.
subscribe offerings is used to subscribe daily offerings.
unsubscribe offerings is used to unsubscribe daily offerings.
update offerings is used to update information (wake up & sleeping time) for daily offerings.
subscribe japanese is used to subscribe daily Japanese lesson.
unsubscribe japanese is used to unsubscribe daily Japanese lesson.
update japanese is used to update information (Kanji N1-N4 level) for daily Japanese lesson.
update name is used to change user’s nickname. By default, Maid-chan will use onii-chan to call users.
subscribe rss is used to subscribe a new RSS Feed with its pattern and let Maid-chan sends a notification if there is an update.
unsubscribe rss is used to remove one of the registered RSS Feed.
subscribe train is used to subscribe Tokyo train status notification and let Maid-chan sends a message if there is an update.
unsubscribe train is used to unsubscribe Tokyo train status notification.
show profile is used to show user’s nickname, subscription status, and preference.
Image Processing with Primitive¶
How It Works¶
Primitive is a machine-learning based program which converts images to its geometric primitives form (written in Go).
Initially, Maid-chan will validate whether the image input is either .png or .jpg.
Maid-chan creates 3 different images via Primitive. The shell command which is executed:
$ primitive -i /tmp/{input_file} -o /tmp/{primitive_output_file} -n 175 -v
Finally, ImageMagick converts those 3 generated images to a GIF file with:
$ convert -delay 1x5 -loop 0 /tmp/{primitive_output_files} /tmp/{gif_output_file}
Since primitive uses hill climbing and simulated annealing in image generation, the entire process might take several minutes depending on CPU capacity.
How to Run¶
- Ensure you have ImageMagick installed.
- Then, you can start running Maid-chan primitive worker by executing:
$ maidchan_primitive
Chatbot with ChatterBot¶
How It Works¶
ChatterBot is a machine-learning based conversational dialog engine which is able to generate responses based on collections of known conversations.
Maid-chan is trained with the provided Indonesian corpus, English corpus, and an additional corpus which is stored at maidcorpus of this project (some of them are Japanese words). Each time a user enters a statement, Chatterbot stores the text and its response for the next learning process.
ChatterBot is language independent, which means that Maid-chan could support 3 different languages quite easily.
The initial idea was to check whether user’s language is in the supported list. However, it seems the accuracy of langdetect is quite low, so it only checks whether the input is a valid language or not (e.g.: emoticon).
Storage Adapter¶
By default, ChatterBot uses chatterbot.storage.SQLStorageAdapter as its storage adapter (since version 0.8.X). For backwards compatibility reasons, we are using MongoDB adapter (chatterbot.storage.MongoDatabaseAdapter) as ChatterBot storage.
See http://chatterbot.readthedocs.io/en/stable/storage/index.html for full references.
Maid-chan Corpus¶
Maid-chan corpus consists of 3 files:
- conversations.corpus.json
- greetings.corpus.json
- trivia.corpus.json
You could append any other training materials as long as the file has the format of corpus.json.
Daily Offerings¶
How It Works¶
Maid-chan has a scheduler worker for handling various daily tasks. For example, Maid-chan will send out daily good morning and good night greetings upon subscription. This feature is known as “Daily Offerings”.
Daily offerings contain of two parts: greeting and image. Greetings are taken from maidchan/constant.py while offerings are taken from offerings/stock/ directory.
Maid-chan supports event type morning messages, e.g.: Halloween, Christmas, etc. If “force” flag is turned on, Maid-chan will use a specific message for the morning greeting. Otherwise, there’s a probability that Maid-chan will use seasonal type of messages.
In general, offerings text functionality has two types: normal and special with 2% of probability. For example, Maid-chan sends morning greeting more than half an hour later than usual because of overslept (2% chance). Greeting text on any specific day is the same for everyone, as it is stored in scheduler’s metadata.
In addition to text-based greetings, Maid-chan offers image as an addition. For all images which are stored under offerings/stock/, Maid-chan will pick 2 random images on daily basis for morning and night offerings. Each day, used offerings are moved to offerings/used/ directory. If there is no image under offerings/stock/ directory, Maid-chan will simply skip sending out images as a part of daily offerings.
How to Run¶
- You can start running scheduler by executing:
$ maidchan_scheduler
- You can subscribe & unsubscribe to Maid-chan’s daily offerings via subscribe offerings and unsubscribe offerings command in the Messenger. Upon subscription, you will be asked 2 questions: your usual waking up time and your usual sleeping time.
Daily Japanese Lesson¶
How It Works¶
One of the functionality of Maid-chan’s scheduler worker is daily Japanese lesson. Currently, daily Japanese lesson sends a random kanji based on user’s level selection (N1 - N4) and a random vocabulary (same for all levels) every 13:00 UTC+9.
The raw data is parsed from Gakuran and currently stored under data/ directory.
-
maidchan.japanese.
get_kanji
(level, current_pos=1)[source]¶ get_kanji returns a single record of the current_pos line position
level: 1 - 4 (N1 to N4) current_pos: up to number of records
Since the source data follows old format of JLPT test, there are only 4 levels in it: N1 to N4. get_kanji reads a certain line (current_pos) based on level selection from the file and returns it to the caller.
-
maidchan.japanese.
get_vocabulary
(current_pos=1)[source]¶ get_vocabulary returns a single record of the current_pos line position
current_pos: up to number of records
The original data does not distinguish vocabulary based on JLPT levels. Therefore, it only accepts a certain line (current_pos), reads it from the file, and returns it to the caller.
Finally, scheduler calls get_japanese_message to construct a daily lesson message.
How to Run¶
- You can start running scheduler by executing:
$ maidchan_scheduler
- You can subscribe & unsubscribe to Maid-chan’s daily Japanese lesson via subscribe japanese and unsubscribe japanese command in the Messenger.
RSS Feed Notifier¶
How It Works¶
Maid-chan accepts RSS Feed subscription and sends out notification for each RSS update. By default, user can use 2 default preset:
- Manga via mangaupdates.com
- Nyaa (nyaa.se)
In addition, user can also use its own custom RSS Feed source.
When the user subscribes using custom RSS, Maid-chan validates whether given URL has a valid RSS format via feedparser library.
Maid-chan uses regex comparison from user’s input to the title of feed entries. Maid-chan stores list of all matched titles from the RSS Feed.
If there is a new title which is not stored in the database, Maid-chan will send out a notification to the user that a new update is available.
Initially, timestamp is used instead of list of titles. However, some sources (mangaupdates) don’t provide timestamp information.
How to Run¶
- You can start running scheduler by executing:
$ maidchan_scheduler
- You can add & remove RSS to Maid-chan’s RSS Feed notifier via subscribe rss and unsubscribe rss command in the Messenger.
Translate text via Google Translate¶
How It Works¶
Maid-chan feat translate-shell provide an interface to use Google Translate as translation tool via Facebook Messenger.
Initially, Maid-chan will identify the given message in 4 parts:
- “translate” keyword (or “terjemahkan” in Bahasa)
- “from” keyword outside quotes, which is used to identify source language
- “to” keyword outside quotes, which is used to identify target language
- source text for translation
Since there are multiple mapping possibilities (“id”, “Indo”, “Indonesia” have the same meaning), “from” and “to” keywords are mapped to its appropriate language code. Currently, Maid-chan supports Bahasa Indonesia, English, and Japanese.
If Maid-chan could not recognize proper mapping, get_trans_language_prediction is called. This function will use Google language detection to see whether Maid-chan could map source text into 1 out of 3 supported languages.
Several default rules in Maid-chan are:
- English will be translated to Japanese
- Japanese will be translated to English
- Bahasa Indonesia will be translated to Japanese
- Default source language is Google language detection and default target language is English
Tokyo Train Status feat Yahoo Japan¶
How It Works¶
Maid-chan feat Yahoo Japan provides real-time notification for Tokyo train status which are defined to be monitored. Currently, there are 3 monitored train lines: Tokyu Den-en-toshi Line, Nambu Line, and Tokyu Oimachi Line.
Since the retrieved data is HTML-formatted, we need to parse the data that we want with BeautifulSoup. In addition, the raw data from Yahoo Japan is all written in Japanese, therefore, we need to do some pre-processing to translate those data to English. When the translation is not found in the defined constant, Maid-chan’s Google Translate feature is used for providing automatic translation.
Notification will be sent when the train status is changed, e.g.: from “Normal Operations” to “Operations temporarily suspended”, from “Operations temporarily suspended” to “Delays”, etc.
How to Run¶
- Ensure you have translate-shell installed (for automatic translation, see Translate text via Google Translate).
- You can start running scheduler by executing:
$ maidchan_scheduler
- You can subscribe & unsubscribe to Maid-chan’s Tokyo train status notification via subscribe train and unsubscribe train command in the Messenger.
Special Thanks¶
Special thanks to Jonas Obrist (ojii) for providing tokyotrainstatus codebase.
Ideas¶
Priority Ideas¶
- (Admin only) Send link to download at home, e.g.: Youtube (feat https://github.com/soimort/you-get), image files, etc
Existing Feature Improvement Ideas¶
- (Admin only) Broadcast message to all users which have talked to Maid-chan at least once
- Modifiable daily Japanese Kanji & Vocabulary time
- Automatic offerings update from upstream
Future Ideas¶
- Japanese language quiz
- Image recognition, e.g.: waifu recognizer - https://github.com/nagadomi/lbpcascade_animeface or self-created
- Natural Language processing for conversing daily conversation (Naturally we can improve it with IBM Watson or Google Cloud Speech, but the model “probably” differs from Maid-chan requirement)
- Mini games
- Location-aware features: Recommendation, weather, etc
- IoT with home electronics
- etc
New Ideas & Issues¶
If you have some interesting ideas, or if you find some bugs, please access project’s page on Github.