In the modern world, here and there ideas are arising about using data science for an extra benefit. For instance, Google can use a history of watched videos for providing recommendations about new ones. Online shops are using a recommendation system for increasing your receipt. However… if companies use the data for their benefit, could we do the same for own needs such as looking an online English teacher?
It is an approach based on my own experience and can be unsuitable to your point of view, ideas, or principles.
The second part is there
Introduction
I am keen on learning English (my native language is Russian), and usually, I use different resources for it. And one of them is… useful enough, except a small downside?-?it has a quite limited toolset for finding a teacher. At the same time, a vast piece of data is hidden behind the scene. And the main point is that we can only see a top of an iceberg and do not know about the information that is waiting for us.
Are you ready to find a teacher who fits your expectations? If so, follow me and I show you how deep the rabbit hole goes.
Step 1. Collect your expectations.
Okay, we are ready to start our journey. But… first things first. What we are looking for? If we know what is our goal, we could estimate the success of our actions. I think you have your own criteria, so do I. Let me share them as an example.
Requirements:
- A price no more than $20 per hour.
- Teachers able to help me in preparation for Cambridge Exams(FCE)
- They have real experience
- Agree to give me homework and check it
- No more than 3 candidates
The first requirement(a price) looks like the easiest one. But in the nutshell?-?everything is harder. Let me explain it.
If you have a maximum amount of money, which you are ready to pay for a lesson, you would probably wonder to face a situation when this is a price for classes is not the same thing which you are waiting for.
I tried to find teachers who are ready to provide an "exam-oriented" lesson for $20. However?-?I received a subset of variants, when $20 was the price for "basic English", meanwhile a preparation for FCE-exam was more costly. It seemed inconvenient, but we are able to cope with it soon. For now?-?just keep it in mind.
The second(an exam)?-?is my main goal. I suppose it does not require any clarification.
The third(real experience) is more complicated than the first one.
Sometimes, people try to pose themselves as professionals after ending for some courses. In my opinion, only a certification?-?it is quite a weak argument, there could be a case, when people have finished a course, but have no relevant skills. So, I would rather consider people having a real experience in teaching than evidence of ending of courses.
The fourthly(homework)- also looks logical, at least for me. A learning theory, pleasant chatting through lessons… well, everything is good. But if you want to learn how to swim?-?you need to swim. And if we paraphrase it?-?"If you want to pass the exam?-?you need to try to pass them, at least on examples".Yes, practice makes perfect. And you want someone to help you and check your progress (like a sports coach), how are you doing your job. You need feedback from your writing, speaking, listening, reading tasks. And for it, you need homework.
The last one (3 candidates max.)- the website gives an opportunity to book three trial lessons for less price. And I would like to use these tries as efficiently as possible.
Step 2. A rough filter.
We have some data received from the website. We almost do not have to clear them, apart from removing some useless information.
And then our dataset could look like that:
For this stage we will consider information from the column pro_course_detail?-?it is a repository of information about teachers and courses which they provide.
It is time to get our hands dirty.
Firstly ?-? we will find teachers related to the main goal?-?an FCE exam.
Secondly ?- ?separate them by price for a lesson(do you remember about the situation when the price could be different from your expectations? We are going to resolve this problem.)
Okay, we have initial criteria. Time to code it.
The first impression
Let's try to visualize our first subset for getting a general overview of the number of teachers per country.
Hmm… looks like people from many countries are ready to support you on your way to a Cambridge Exam. Mainly they are British(a label "GB"). I anticipate it is a consequence of "the nature" of the exam. However, I glad to see people from my motherland ("RU" means Russia) who also are ready to give you a hand.
So, we can strike two moments out.
A price no more than $20 per hour.
Teachers able to help me in preparation for Cambridge Exams(FCE)
However, we still have others to have to be done
- They have real experience
- Agree to give me homework and check it
- No more than 3 candidates
Step 3. Filtering by description
Here is beginning something which a website could not provide you?-?a searching over a text description.
We have some columns in our dataframe which could present some extra information
A bit more about people who have nothing against becoming your teacher:
- about_me?-?it is a short description of teachers as humans, who they are, where they are from. Usually, there are the most basic things about their lifestyle and things like that
- about_teacher?-?it is more related to professional skills. Some are good at test preparation (IELTS, TOEFL, etc..), others could help you to be ready for a job interview, or could teach you how to use language with your business partners. In short?-?it is a specialization.
- teaching_style?-?information about the style of your future POTENTIAL classes. How a teacher would conduct them.
- introduction?-?usually people fill it with some information about them. Sometimes it is empty or copied info other text columns
Okay, do our best and try to resolve other requirements from the list.
To do a function for filtration by specific word sequence, as a result, we have a boolean mask for applying on the dataframe.
After that, we are going to create a chain/combination of boolean masks for reducing the size of our dataset. I guess that expressions like "I have been…" is a good idea to find teachers who have a real experience. At the same time?-?the word "Homework" is a key indicator for people who will check out your tasks.
And then show how many candidates do we have.
Many teachers were excluded from our dataset. The big part of our dataset (people from Great Britain) is gone. But, there is room for optimism, people from 7 different countries fit our sophisticated criteria.
The interesting thing is that someone from Russia is still there.
So, now we can eliminate the other two.
A price no more than $20 per hour.
Teachers able to help me in preparation for Cambridge Exams(FCE)
They have real experience
Agree to give me homework and check it
But… there is big "BUT"
- No more than 3 candidates
And now it looks like… we get stuck into this step.
Summary
We picked the low-hanging fruit, using by explicit features of our dataset. Moreover, we used underestimated pieces of information from text descriptions. But unfortunately, it is not enough, for getting things done.
So… it is time to take a break, to look at the "nature" of the subject domain from another point of view and then to cope with the problem.
There is every indication that something will happen in the second part of this story…
P.S. The Ipython-notebook is located there.