Implementing statistical sampling into the Atlas TDAQ Network

This is my bachelor project while working in the Atlas TDAQ Networking Group, which is a part of the ATLAS experiment on the Large Hadron Collider located at CERN.

The ATLAS data acquisition system consists of four different networks interconnecting up to 2000 processors using up to 200 edge switches and five multi-blade chassis devices. For performance monitoring and troubleshooting purposes there was an imperative need to identify and quantify single traffic flows. sFlow is an industry standard based on statistical sampling which attempts to provide a solution to this.

Due to the size of the ATLAS network, the collection and analysis of the sFlow data from all devices generates a data handling problem of its own.

This report describes how this problem is addressed by developing a system that makes it possible to collect and store data either centrally or distributed according to need, the methods used to present the results in a relevant fashion for system analysts are discussed and we explore the possibilities and limitations of this diagnostic tool, giving some examples of its use in solving system problems that arise during the ATLAS data taking.