Machine learning is a relatively new technology afforded to us by both rapid advancements in computers as well as the new paradigm known by the buzzword “big data.” What machine learning really is is the crunching of data by computers to recognize patterns in that data. Give a computer enough data to eat up, and it’ll be very accurate in its pattern recognition.
When trying to get a computer to recognize patterns in data through machine learning, you need only a couple of things. The most important is a large set of data which has labels for each of its values. For the basics, think of a huge excel spreadsheet with some labeled columns that contain anywhere from fifty rows to hundreds of thousands of rows. As an example, think of a dataset of identifying features of fruits that has three columns: Weight, Texture, and Label. The Label is always needed in machine learning of this kind as it is the main feature of the set of stuff you’d like to find patterns in.
Oranges tend to be 150-180 grams and bumpy, while apples tend to be 120-150 grams and smooth. If you got 100 rows of data which contain data about weight and texture of example apples and oranges and then fed that into a machine learning algorithm, it would notice that very same thing. Now, if you feed the trained algorithm some new data, say 20 rows of apples and oranges different from those used to train it, it would likely be able to identify most of them correctly as either an apple or an orange. Machine learning algorithms are readily available for free online, distributed by big software companies. One example is Google’s TensorFlow. Because these systems do a lot of the heavy lifting in terms of programming for you, running a dataset through TensorFlow’s algorithm is as easy as six lines of code.