[MUSIC] Usually we develop and debug machine learning systems in general purpose computers. However, nowadays machine learning systems are deployed not only in personal computers, but also on custom hardware, especially in this current era of Internet of things. Some examples you might be familiar with, include smart mobile devices, robots of various sizes, smart cars, and home electronic appliances, such as smart refrigerators and smart television, even light bulbs. The thing is, these devices have limited memory capacity and computational power. We're used to using servers, general purpose computers, or clusters when developing and training machine learning models. These provide high memory bandwidth and extensive processing capacity, which is great if the model will operate in that environment, not so much if it needs to run on a light bulb. This difference in resource bandwidth between development and production environments is a major challenge we need to address before deploying any machine learning model to the real world. The main goal here is to make sure that your production machine learning models have reasonable performance. In particular, you need to watch for smooth execution, without any lagging or downtime outside of the development environment. So how do we do this? Let's go over some highlights, complex and very large models, such as deep neural networks, usually require a lot of memory capacity and a lot of processing power to execute efficiently. Although simpler learning algorithms, such as linear regression models, may not have such high resource demands, they might not achieve the expected performance benchmarks. This gives rise to a trade-off between performance and resource demands, which we've discussed in previous courses. We may find scenarios where we can sacrifice resource demand to achieve high performance. For example, increase the size of the robot so that we can include a larger memory and a better processor and more power for the same, or we may find scenarios where we can sacrifice performance to the resource demand. We use a less accurate, but simple model, that runs on the hardware we have, however, when we can't sacrifice one over the other, that's when innovation happens. Writing more efficient code is an obvious step for creating resource efficient machine learning, this applies at all stages, from training to prediction. For instance, look at the data structures you're using, and I mean look don't prematurely optimize, look at the time your code takes to run and compare it once you've tweaked it. Because of sophisticated compilers, what seems like it will be more efficient can sometimes be worse. But simple things, like reducing the number of variables, getting rid of useless features early on, efficiently using data types used to store data, all can go a long way to using memory more efficiently. You can also look at optimized libraries that are designed to run faster, unless you're an expert coder, replacing custom code with optimized and tested code often speeds things up. Depending on how often a calculation is done, precomputing values and storing them in memory rather than recalculating can save time. Python is currently the preferred programming language for machine learning because of the various advantages it carries. However, it is an interpreted language and may run slower than compiled languages, such as C and C++, at least slower than well-programmed C and C++. You can take advantage of Python compilation tools, like PyPi and Cython, to gain efficiencies automatically. One reason Python is popular for machine learning, is that there are a lot of great libraries that are built around highly optimized C code already. But for truly resource constrained hardware, you're probably going to want an expert in a compiled language. Another important design choice is deciding whether to run the machine learning algorithm in the cloud, many smart mobile applications have used this kind of approach in the past. This gets you all the computing resources you could possibly want, but it does create its own issues, such as connectivity and network latency. With recent improvements in the memory and processing capacity of smartphones, performing the predictions locally might be a better choice. Of course, the size of the qualm, or model, created by the machine learning algorithm can be tweaked as well. One major approach to this is called model compression. Model compression aims at taking large machine learning models and converting them into something more efficient, without hindering the performance of the original. One approach to model compression is known as the student teacher approach. In the student teacher approach a large and complex neural network is trained as usual, and this is called the teacher. You aim to find the model that gives the best performance that you need, then the student is trained. Rather than repeating the architecture of the teacher, a smaller and simpler structure is enforced, the student trains solely on the output of the teacher. Surprisingly enough, under this approach you can end up with similar model accuracy without the intensive resource requirements demanded by the teacher model. Early in your life cycle process we told you to take note of the constraints your qualm needs to operate under, the time and space complexity of the operational environment is one big category to consider. Exploring other model compression methods and looking to other projects within your domain, will give you a better understanding about the potential. With careful fore thought, you'll be able to meet the requirements for useful machine learning products.