Organizers: Jianhua Ruan, Wei Wang, Weining Zhang with support from the CS Graduate Student Association and faculty.
Time: 1:00-2:00 pm, Wed / Fri (as posted below)
Place: NPB 3.108, CS Conference Room
Schedule for Spring 2017
- Wed, 1/25
- Lee Boyd (Zhang lab)
- Riad Akram (Muzahid lab)
- Fri, 2/10
- David Holland (Zhang lab)
- Jin Han (Yu lab)
- Fri, 3/3
- Foyzul Hassan (Wang X. lab)
- Brita Munsinger (Quarles lab)
- Wed, 3/29
- Joy Rahman (Lama lab)
- Sharif Mohammad Shahnewaz Ferdous (Quarles lab)
- Wed, 4/26
- Sam Silvestro (Liu lab)
- Richard Garcia-Lebron (Xu lab)
Abstract: Since 2012, through optimizations and enhancements, neural networks have been able to grow deeper and smarter, finding their way into many sufficiently complex computing tasks and gaining a popular nickname 'deep learning'. Deep learning's foundational algorithms rest on gradient descent (GD). I discuss a simple example a GD algorithm and show the evolution from machine learning to deep learning. Included in this evolution are early optimizations like a convolutional neural network (CNN) which use drastically fewer weights for significantly reduced memory storage, allowing a shift to GPUs from CPUs with no loss in accuracy.Inspired by biological neural networks (a cat's visual cortex), CNNs use convolutions to abstract local input in a structured, hierarchical way, overlapping filters convolved through the input. Despite the gains from CNNs, the networks must get virtually deeper and physically smaller requiring faster training due the big data fire-hoses, power needs, device embedding and progressively complex problems. Like most problems in big data, parallelization is the key. The hierarchical nature of networks, however, makes parellization a challenge. Traditionally, data or model parallelization are th choices for this type of processing. I cover the costs and benefits of these types of parallelism relative to fully connected, convolutional, and pooling layers and discuss an important technique that harnesses compiler optimized matrix multiplication. In my conclusion, I drop some hints of research that will further the work and why that is important.
Abstract: Approximate computing is gaining a lot of traction due to its potential for improving performance and consequently, energy efficiency. This project explores the potential for approximating locks. We start out with the observation that many applications can tolerate occasional skipping of computations done inside a critical section protected by a lock. This means that for certain critical sections, when the enclosed computation is occasionally skipped, the application suffers from quality degradation in the final outcome but it never crashes/deadlocks. To exploit this opportunity, we propose Approximate Lock (ALock). The thread executing ALock checks if a certain condition (e.g., high contention, long waiting time) is met and if so, the thread returns without acquiring the lock. We modify some selected critical sections using ALock so that those sections are skipped when ALock returns without acquiring the lock. We experimented with 14 programs from PARSEC, SPLASH2, and STAMP benchmarks. We found a total of 37 locks that can be transformed into ALock. ALock provides performance improvement for 10 applications, ranging from 1.8% to 164.4%, with at least 80% accuracy.
Abstract: Container technology, operating system virtualization, can better manage limited resources better than hypervisor based VMs. Could High Performance Computing (HPC) applications perform well with a cluster of containers? This paper examines performance and resource usage trade-offs between jobs executed on clusters of light-weight containers vs clusters of VMs. Benchmarks are presented, that validate the idea that for some types of HPC applications requiring real-time launching of scarce resources are better suited for a container cluster based execution architecture. A new architecture to build out and manage the execution of container clusters is described.
Reference: de Alfonso, C., Calatrava, A., & Molto, G. (2017). Container-based virtual elastic clusters. Journal of Systems and Software, 127 (2017), 1-11.
Abstract: Cache side-channel attacks have been extensively studied on x86 architectures, but much less so on ARM processors. The technical challenges to conduct side-channel attacks on ARM, presumably, stem from the poorly documented ARM cache implementations, such as cache coherence protocols and cache flush operations, and also the lack of understanding of how different cache implementations will affect side-channel attacks. This paper presents a systematic exploration of vectors for flush-reload attacks on ARM processors. flush-reload attacks are among the most well-known cache side-channel attacks on x86. It has been shown in previous work that they are capable of exfiltrating sensitive information with high fidelity. We demonstrate in this work a novel construction of flush-reload side channels on last-level caches of ARM processors, which, particularly, exploits return-oriented programming techniques to reload instructions. We also demonstrate several attacks on Android OS (e.g., detecting hardware events and tracing software execution paths) to highlight the implications of such attacks for Android devices.
Reference: Yuan Xiao, Yinqian Zhang, Return-Oriented Flush-Reload Side Channels on ARM and Their Implications for Android Devices. CCS'16, Vienna, Austria, Oct. 2016. .
Abstract: Continuous Integration(CI) is a well adopted development practice where developers integrate their work after applying code modifications. CI servers usually check Git or SVN repositories for code changes and perform build, unit testing, integration testing and generate test summary report if modifications are committed to the repository. Despite the widespread adoption of CI, little is understood about the multiplicity of errors that may occur during build, and factors that lead to build failures. Yet, during development, a large amount of time and focus goes into finding such errors, and then fixing broken build to allow the continued development on top of successfully built and tested changes. Furthermore, for large software projects often build chains may run for long time. Long build chain inhibits one of the key purpose of CI: to produce rapid feedback on the effects of an integration to system. In our research, we proposed build prediction model that uses build log clustering and AST level code changes to predict whether build will be successful or not to avoid integration delay.
Abstract: Cybersecurity awareness and cyber skills training are vitally important and challenging. A huge number of attacks against everyday users occur routinely. Prevention techniques and responses are wide ranging but are only effective if used effectively. The objective of this research is to teach everyday users the requisite cybersecurity skills through gaming, beyond the current state of practice. Because the skill level of the trainees is also wide ranging, from causal computer users to software engineers to system administrators to managers, the games must also be capable of training this wide range of computer users. Computer games can provide a media for delivering training in an engaging format at levels appropriate for the individual trainees. In this paper we (1) describe the state of practice by describing the gaming tool used in most cyber challenges at high schools and colleges in the U.S, i.e., the cybersecurity gaming tool CyberNEXSTM (Science Applications International Corporation), (2) outline some of the additional topics that should be addressed in cybersecurity training and (3) note some other approaches to game design that might prove useful for future cybersecurity training game development beyond CyberNEXS.
Reference: Nagarajan, Ajay, Jan M. Allbeck, Arun Sood, and Terry L. Janssen. "Exploring game design for cybersecurity training." In Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2012 IEEE International Conference on, pp. 256-262. IEEE, 2012.
Abstract: Big Data processing in the cloud is increasingly popular due to the economic benefits of elastic resource pro- visioning. However, data-intensive cloud services suffer from the overheads of data movement between compute and storage clusters, due to the decoupled architecture of compute and storage clusters in existing cloud infrastructure. Furthermore, cloud storage clusters are often underutilized since they need to be continuously up and running regardless of changing workload conditions, for high availability, and fault tolerance. In this work, we explore a unique opportunity for In-situ Big Data processing on storage cluster by dynamically offloading data-intensive jobs from compute cluster to leverage the idle CPU cycles of storage cluster, and improve job throughput. However, it is challenging to achieve this goal since introducing additional workload on the storage cluster can significantly impact interactive web requests that fetch cloud storage data, while typically imposing strict SLA requirements in terms of the 90th percentile response time. In this work, we propose a novel compute-storage multiplexing technique that aims to improve the big data processing throughput, and the storage cluster utilization without violating the SLA of interactive requests. We designed and implemented a system that per- forms real-time monitoring of big data and storage interactive workloads, and applies a distributed rate limiting technique to efficiently multiplex compute and storage cluster for big data processing. Experimental results using big data and cloud- storage benchmarks shows that our approach improves job throughput by 1.7 times and storage cloud CPU utilization by 75% while maintaining the SLA for interactive requests.
Abstract: Most people experience some imbalance in a fully immersive Virtual Environment (VE) (i.e., wearing a Head Mounted Display (HMD) that blocks the users view of the real world). However, this imbalance is significantly worse in People with Balance Impairments (PwBIs) and minimal research has been done to improve this. In addition to imbalance problem, lack of proper visual cues can lead to different accessibility problems for PwBIs (e.g., small reach from the fear of imbalance, postural instability, etc.) We plan to explore the effects of different visual cues on peoplesí balance, reach, etc. Based on our primary study, we propose to incorporate additional visual cues in VEs that proved to significantly improve balance of PwBIs while they are standing and playing in a VE. Our current study proved that additional visual cues have similar effects in augmented reality. We are also developing studies to research reach and presence of virtual instructor in VR as our future work.
Abstract: Graph coloring is a fundamental problem that has many applications. Because of the inherent hardness of the problem, proper coloring (i.e., the two end nodes of every edge are assigned with different colors) requires sufficiently many colors (much more than the chromatic number of the graph in question). However, there are application settings (e.g., cyber security) where "colors" are very expensive to obtain. This motivates the concept of defective coloring where the two end nodes of some edges are assigned the same color. This setting calls for defective coloring algorithms that can achieve the best-effort effect with respect to a given small number of colors. We extend the analysis of diversity to include the attack power, cost of initial compromise and lateral movement. In addition, we present a generalize framework that includes multiple layer of diversification and take in consideration possible similarities between the configurations.
Questions and Comments?
Please send emails to email@example.com.