The research on analysis and detection of malware has shown notable progresses over the years, but mainly related to malicious programs for Windows systems. However, the adoption of Linux-based machines (e.g. servers, desktops, IoT devices) is rapidly increasing, attracting the attention of malware writers. Linux malware pose new challenges going from their ability to target a broad choice of CPU architectures, to the study of malicious techniques different from the ones seen in the Windows world.
In this presentation we propose the first automatic analysis pipeline to perform large-scale analysis of Linux malware. Our system tries to avoid, or limit, reverse engineering efforts usually performed manually. For example, some analysis modules, run as parallel jobs, are in charge of performing static analysis on the binary and its ELF header. On the other hand, the modules for dynamic analysis runs Linux samples in sandboxes for x86, ARM, MIPS or PPC architectures.
The talk will be about what Linux reverse engineers should look for when dissecting Linux samples, anomalies that can arise from the ELF header, persistence and evasion strategies and how we get insights on malware interactions with the system. All these cases are recent and have been observed in the wild in the last year. Finally, we will explain how we built multi-architecture sandboxes by using kprobes and uprobes, a debugging facility integrated in the Linux kernel. The analysis infrastructure for Linux-based malware will be released as a free service. Moreover, the full dataset will be provided to researchers upon request.