Week 23: (Not so much) Final steps

ainergyy
Apr 24, 2022
2 min read

At the beginning of this last week, the main challenge was pretty much done. The UI was complete the previous week, the dialog manager only needed some minor bug fixes and the BERT model was stable and producing great results. As such, the only thing that needed some work was replicating its success using the GPT-3, in order to compare them. From the first GPT-3 tests, it was clear that it required very little training and could be easily adapted for our use case.

Given the easiness of the remaining tasks as well as the fact that we felt comfortable using BERT pretrained models, as well as creating our own datasets, the team set more ambitious goals and decided to create a few auxiliary, smaller, models. The goal was to create models for some of the tags that could both (1) map synonyms to a few keywords (similar to Lemmatization, but without needing to map it) and (2) identify those keywords if they were tacit within the sentence. In the case of the "energy-parameter" tag, this would mean that first goal would interpret sentences such as:

"What is my energy usage today?"

the same as:

"What is my consumption today?

On the other hand, the second goal would extract the implied energy parameter from sentences such as:

"What is going on with the solar panels?"

and process it as:

"What is going on with the generation ?"

This way, the chatbot becomes more resilient if a synonym that was not in the training set was thrown at it. It is worth noting the BERT model itself is pretrained and could easily tag those synonyms as "energy-parameter", but would not know whether it was generation, consumption or flexibility.

As a bonus, the team also wanted the chatbot to be resilient to minor typos, and as such trained a BART model, using its sequence-to-sequence capabilities to rewrite the sentence. As for the typos used to train the model, a script was created that would randomly swap two characters within a word, swap characters for its QWERTY-keyboard neighbours, remove a characters or/and write a single character multiple times.

From all these efforts resulted our final chatbot that is displayed at its peak performance in the following video:

We wish we could tell an equally epic development journey for the GPT-3, but the OpenAI's prodigy child boringly lived up to its reputation and aced whatever we threw at it with minimal training...

We gave it 66 mere example lines, 10% of which containing typos and it straight up murdered our BART model on typo correction. It also performed competitively on intent detection, slot filling and even implied-slot detection.

We would love to end this project on a note that our BERT-BART Frankenstein was unbeatable, but the reality is that if not for its costs, GPT-3 would be the better option, while needing 5% the effort, as seen in the comparison tables below:

Well, this took a turn...

The team will be focused on the presentation and the pitch next week, but be ready for a mid-week update with our final thoughts!

Week 23: (Not so much) Final steps

Recent Posts

Comments