What is Data Processing in Machine Learning?

Data Processing in Machine Learning:

After you have a team in place and a rough idea of how all of this will get put together, it’s time to turn your attention to what will do all the work for you. You must give thought to the frequency of the data process jobs that will take place. If it will occur only once in a while, then it might be a false economy investing in hardware over the long term. It makes more sense to start with what you have in hand and then add as you go along and as you notice growth in processing times and frequency.

1. Using Your Computer: Yes, you can use your machine, either a desktop or a laptop. I do my development on an Apple MacBook Pro. I run the likes of Hadoop on this machine as it’s pretty fast, and I’m not using terabytes of data. There’s nothing to stop you from using your machine; it’s available and it saves financial outlay to get more machines.

There can be limitations. Processing a heavy job might mean you have to turn your attention to less processor-intensive things, but never rule out the option of using your machine. Operating systems like Linux and Mac OSX tend to be preferred over Windows, especially for Big Data–based operations.

2. A Cluster of Machines:
Eventually, you’ll come across a scenario that requires you to use a cluster of machines to do the work. Frameworks like Hadoop are designed for use over clusters of machines, which make it possible for the distribution of work to be done in parallel. Ideally, the machines should be on the same network to reduce network traffic latency.

3. Cloud-Based Services:
If the thought of maintaining and paying for your hardware does not appeal, then consider using some form of cloud-based service. Vendors such as – Amazon, Rackspace, and others provide scalable servers where you can increase or decrease the number of machines and amount of power you require.

The advantage of these services is that they are “turn on/turn off” technology, enabling you to use only what you need. Keep a close eye on the cost of cloud-based services, as they can sometimes prove more expensive than just using a standard hosting option over longer periods.