Some thoughts about Machine learning

ArneOrtlinghaus · Post by **ArneOrtlinghaus** » Fri Nov 06, 2020 7:52 pm

In the last years I have read much about machine learning and tried to find places to introduce low level machine learning into our programs for enhancing user experience:
- proposals of values,
- validations
- data control for existing data

I must admit, that it is much more difficult than I thought:
- For machine learning much correctly classified data is needed. As long as few data is available, the algorithm is "stupid".
- The data is needed, where the learning algorithm is placed. If this is outside a company, it can be difficult to get the permission in times of the severe European data protection rules.
- The algorithm will not always give results the users think they need and needs help. So for every location where ML is used, there is needed also a "supervisor" who can adapt the results.
- Setting up an algorithm for a single task may take more work than thinking about decision rules or user-enabled rule creation.
- Machine learning may need resources that are not always available
- Manually defined decisions trees by experienced users are often much more precise than machine learning rules

Nevertheless I think that some strategies needed for machine learning are always worth to think about:
- A standard machine learning algorithm consists of several inputs and one output (answer). All tasks that need decisions have to be separated in those single tasks with several inputs and one output.
- The inputs and the output can be single numbers, strings, collections of numbers, images, ...
- Input information must be reduced as much as possible using domain knowledge
- If the inputs are time series (for example user responses at different times), then features like "last result", "average over the last results", "maximum over the last results" or "probabilities of different results" should be calculated
- Manually defined rules should not be programmed fix into the program. Instead macros or matrices with decision tree information stored in a database should be used.
- Often a variant of the "nearest neighbors algorithm" is sufficient to find good proposal values in database programs: The nearest value corresponds to the database record for the current user for the currently selected end customer for the last time. If the last time is too far away or the current user did not work for the end customer, then it can be the record for the end user for the last time.

Apart from this the user interface often needs visible changes. The user does not need to have one single result that should fit to everything. Instead he wants to have an immediate overview how many possibilities he has with which probabilities. And it is here that all the nice theory is reduced to simple statistics.

Of course machine learning is an interesting and important topic. It depends just if you do something for thousands or millions of users, or if you depend on data for below 100 or few hundreds of users.

VR · Post by VR » Sat Nov 07, 2020 3:26 pm

Two quick comments:

* There is an interesting concept for situations, where you want to use supervised learning, but the available training data is limited. It's called active learning. It's based on the idea, that a supervised learning algorithm is trained on a very small sample. Than the model is used to select values that are hard to classify and asks the user to classify them. Based on the responses, the model gets retrained and gets better. One library that used this approach is dedupe that performs deduplication of records.

* One interesting example for the use of machine learning IMHO is intellicode in visual studio. When you open the intellisense list in visual studio, intellicode uses its model (that was trained on the most popular github projects) to present the 4 most relevant elements in the current situation on the top of the list. This is a great example for the use of machine learning in a "non intrusive" way. The user is still the one making the choice, the ml model simply tries to reduce the time needed to search the intellisense list.

Terry · Post by **Terry** » Sun Nov 08, 2020 9:18 am

Hi Arne

There are two routes in real life that allow us to get from A to B. One is to go directly, the other is to go via C, or C then D, then E and so on.
These two routes exist within each and every computer program providing linkage between classes. In any OOP this can be considered as being “deterministic” (direct A to B ) and unsurprisingly “non-deterministic”. If your program is to work properly the end of a non-deterministic chain must always lead to a final deterministic link to output. Any object not in such a chain is redundant and should not be there.

I live in St. Albans, a beautiful city in the UK with a Cathedral dating back well over 1500 years. I know St. Albans pretty well – we have a pub close to where I live “The Three Hammers”, very popular.
We are close to a motorway, so it is no surprise that we often get visitors, coming by car. They know where they want to go, but don’t know how to get there. They drive around looking for the place they want, take all sorts of different routes, may go round in circles and/or never find the place they are looking for.
How could I help them? I could arrange to meet the at the motorway junction and take them to their intended destination. But they don’t want to tell me what that is. OK I’ll just drive round every road/ street in St Albans until they see their intended destination. I always take the same route – “my route” – since I know St. Albans, I can make sure we cover every street and do not go round in circles.

Of course, this goes on with more and more visitors, and I notice that many go to the Cathedral. So, I modify “my route” to make the Cathedral the first place I take them to, "Hammers" the second and so on.

Hope you enjoyed the visit and will come again

Terry

ArneOrtlinghaus · Post by **ArneOrtlinghaus** » Mon Nov 09, 2020 8:24 am

Hi Volkmar and Terry,

thank you for the answers. I didn't want to make any doubts about the benefits of machine learning.
The "intellicode" feature in VS is a nice example that machine learning can help. But...
- Much data is needed (IntelliCode's base model was trained on over 3000 top open source GitHub repositories). As Microsoft selected "top" sources, Microsoft did make a first important classification as "Top" models and not to use beginners source code.
- Much time is needed to build a system that helps. I believe that there are some person years of development behind.

Arne

wriedmann · Post by **wriedmann** » Mon Nov 09, 2020 8:32 am

Hi Arne,
I have looked several times at machine learning, but I have discarded it every time because it does not fit in my range of customers: there are ways too less data to implement machine learning.
But nevertheless I think it is possible to take some advantage from the ideas of ML and implement something similar in our applications.
Wolfgang

Terry · Post by **Terry** » Mon Nov 09, 2020 10:47 am

Hi Arne

Yes - complex indeed.

In my "visit St. Albans" analogy I said "I notice many go to the cathedral" but of course in programming terms I have no way of noticing that unless visitors tell me specifically.

So I could ask, but that depends on whether they want to answer, which in turn depends on whether they want to answer truthfully or to mislead.

Whatever the answer, it is however given by a human.

Another way of looking at things is illustrated by these forums.

By reading user comments and requests the development team is able to decide on suitable features to add to XSharp. Again it will be a humanly-generated feature list.

IMO this makes a complete nonsense of the terms Machine Learning and Artificial Intelligence. Learning and Intelligence remain exclusively in the Animal Kingdom.

Terry

ic2 · Post by **ic2** » Tue Nov 10, 2020 12:13 pm

Hello,

Indeed, machine "learning" is highly overrated. Currently I would say there are some quite advanced search algorithms working which power search machines as Google, Bing. They can retrieve many hits from an massive amount of website data and can often find the intended results by combining the search words as one logical request instead of separate words and they even work with small typos in the search string.

However, consider the many websites with "intelligent assistants". I have not seen a single one which could do more than what the searching algorithms do. If I hadn't found my answer somehow, the only thing I try to get done from such an "assistant" is that I am connected to a human (which is something most companies try to avoid at all costs, they rather spend their money on advertising to get new customers than customer service to keep current).

Another example is speech recognition. Although the professional version of e.g. Dragon Naturally Speaking can understand me reasonably, there is zero learning from mistakes and although I gain some time I have a higher chance that overlook some totally unintended (=wrong) interpretation of what I said. So checking takes most of the time gain. Not much different from >10 years ago. I have voice recognition in my car and after trying a few things, all understood wrong, I left it unused. I am not the only one, if you watch episodes of Top Gear or successor "The Grand Tour" the hosts regularly how how bad voice recognition actually works.

It may advance in time but it goes very slow at best. And I have to see the first system that really seems to learn anything except some small search improvement tricks based on earlier searches.

Dick

Main menu