Hi! After an introduction to the roofline model, we are now ready to see how to use it and to take decisions based on it. The model considered so far only provides upper bounds on the achievable performance for an application but does no give insights on the reason why the real performance of the application is below the theoretical bounds. As an example, let’s assume that our application is compute bound and its real performance are about 25% of the peak value. In order to understand how it could be possible to improve the performance of the current implementation, the roofline model is enriched with additional upper bounds referred as "ceilings". A ceiling represents a specific optimization to be performed without which, the application cannot exceed the performance limit denoted by the ceiling itself. In order to better understand this concept, let’s continue with our example. As we can see, we have introduced two ceilings to our roofline model: Task Level Parallelism (TLP) and the Instruction Level Parallelism (ILP) ceilings. In case the optimizations described by the ceilings were not yet applied to the application, the programmer can try to modify the application and implemented the suggested optimization to improve the performance. Together with the additional ceilings on the peak performance bounds, the roofline model can also be enriched with "walls". A wall describes an optimization on the memory transfer, without which it is not possible to reach the maximum theoretical bandwidth between the off-chip memory and the chip. Even though the roofline model born for general purposes processors, the same concepts are recently being explored on the FPGA domain. You may notice that one of the main differences here is that an FPGA does not defined a fixed architecture hence, it is not easy to clearly define walls and ceilings since there are a number of possible architectures that can be implemented for the very same application, as opposed to a general purpose processor in which the architecture is well defined and known. Nevertheless, the roofline model can still be used to understand which is the maximum level of performance that can be achieved on a target FPGA, so that the designer can quickly evaluate different devices and identify, at an early stage, whether an FPGA might fit or not his/her performance requirements, provided, of course, that the final implementation is fully optimized. Furthermore, the information on whether the current implementation lies in the I/O bound or computed bound or computed bound area is still available within the roofline model targeting FPGAs. Indeed, we can leverage this information when optimizing our application. If we found out that the implementation lies in the I/O bound area but the current performance is far from the theoretical one, we can start thinking of rewriting our code in order to improve the efficiency of memory transfers. As an example, we can exploit the fact the FPGA offers the possibility to increase the bitwidth of the data transfers as well as the number of memory ports to use for transferring the data to and from the off-chip memory. Additionally, if we already implemented efficient memory transfers, another option to increase the performance of the application is to try to move from the I/O bound area to the compute bound area by changing the operational intensity of the application. One way to achieve such goal, is to leverage the local memories available on the FPGA to minimize the data transfer required and, as a consequence, to increase the number of operations performed with respect to the amount of data moved to and from the off-chip memory. One more technique that can be used to increase the operational intensity is to compress the data input and output data so that the overall memory traffic is reduced. Notice indeed, that in many situations, we can reduce the number of bits needed to encode our data. In the following, we will see how to use the roofline model to route the optimization process for optimizing an application on a given FPGA.