POS-18:Papers with Abstracts

Abstract. Modern CDCL (conflict-driven clause learning) SAT solvers are used for many practical applications. One of the key ingredients of state-of-the-art CDCL solvers are efficient restart schemes. The main contribution of this work is an extensive empirical evaluation of various restart strategies. We show that optimal static restart intervals are not only correlated with the satisfiability status of a certain instance, but also with the more specific problem class of the given benchmark. We further compare uniform restart intervals with the performance of non-uniform restart schemes, such as Luby restarts. Finally, we revisit the dynamic restart strategy used in Glucose and propose a new variant thereof, which is based on the concept of exponential moving averages. The resulting implementation in Lingeling improves state-of-the-art performance in SAT solving.
Abstract. In recent years, a lot of effort has been expended in determining if SAT solver performance is predictable. However, the work in this area invariably focuses on individual machines, and often on individual solvers. It is unclear whether predictions made on a specific solver and machine are accurate when translated to other solvers and hardware. In this work we consider five state-of-the-art solvers, 26 machines and 143 feature instances selected from the 2011 to 2014 SAT competitions. Using combinations of solvers, machines and instances we present four results: First, we show that UNSAT instances are more predictable than corresponding SAT instances. Second, we show that the number of cores in a machine has more impact on performance than L2 cache size. Third, we show that instances with fewer reused clauses are more CPU bound than those where clause reuse is high. Finally, we make accurate predictions of solution time for each of the instances considered across a diverse set of machines.
Abstract. Contemporary SAT solvers emit unsatisfiability proofs in the DRAT format to increase confidence in their answers. The state-of-the-art proof verifier drat-trim uses backward- checking and deletion information to efficiently check unsatisfiability results. Checking large proofs still takes as long as solving a formula, and therefore parallelization seems to be a promising approach. However, Heule et al. stated that parallelization of the backward- checking procedure is difficult since clausal proofs do not include dependency information. In this paper, we present a parallelization approach and a prototypical implementation that scales reasonably compared to its sequential version.
Abstract. We highlight important real-world optimization problems arising from data analysis and machine learning, representing somewhat atypical applications of SAT-based solver technology, to which the SAT community could focus more attention on. To address the problem of current lack of heterogeneity in benchmark sets available for evaluating MaxSAT solvers, we provide a benchmark library of MaxSAT instances encoding different data analysis and machine learning problems. By doing so, we also advocate extending MaxSAT solvers to accept real-valued weights for soft clauses as input via the presented problem domains in which the use of real-valued costs plays an integral role.
Abstract. A Pseudo-Boolean (PB) constraint is a linear inequality constraint over Boolean variables. A popular idea to solve PB-constraints is to transform them to CNFs (via BDDs, adders and sorting networks [5, 11]) and process them using – increasingly improving – state-of-the-art SAT-solvers. Recent research have favored the approach that uses Binary Decision Diagrams (BDDs), which is evidenced by several new constructions and optimizations [2, 21]. We show that encodings based on comparator networks can still be very competitive. We present a system description of a PB-solver based on MiniSat+ [11] which we extended by adding a new construction of selection network called 4-Way Merge Selection Network, with a few optimizations based on other solvers. Experiments show that on many instances of popular benchmarks our technique outperforms other state-of-the-art PB-solvers.
Abstract. Performing hundreds of test runs and a source-code analysis, we empirically identified improved parameter configurations for the CryptoMiniSat (CMS) 5 for solving crypto- graphic CNF instances originating from algebraic known-plaintext attacks on 3 rounds encryption of the Small AES-64 model cipher SR(3, 4, 4, 4). We finally became able to reconstruct 64-bit long keys in under an hour real time which, to our knowledge, has never been achieved so far. Especially, not without any assumptions or previous knowledge of key-bits (for instance in the form of side-channels, as in [11]). A statistical analysis of the non-deterministic solver runtimes was carried out and command line parameter combinations were defined to yield best runtimes which ranged from under an hour to a few hours in median at the beginning. We proceeded using an Automatic Algorithm Configuration (AAC) tool to systematically extend the search for even better solver configurations with success to deliver even shorter solving times. In this work we elaborate on the systematics we followed to reach our results in a traceable and reproducible way. The ultimate focus of our investigations is to find out if CMS, when appropriately tuned, is indeed capable to attack even bigger and harder problems than the here solved ones. For the domain of cryptographic research, the duration of the solving time plays an inferior role as compared to the practical feasibility of finding a solution to the problem. The perspective scalability of the here presented results is the object of further investigations.
Abstract. DRAT proofs have become the de facto standard for certifying SAT solvers’ results. State-of-the-art DRAT checkers are able to efficiently establish the unsatisfiability of a formula. However, DRAT checking requires unit propagation, and so it is computationally non-trivial. Due to design decisions in the development of early DRAT checkers, the class of proofs accepted by state-of-the-art DRAT checkers differs from the class of proofs accepted by the original definition. In this paper, we formalize the operational definition of DRAT proofs, and discuss practical implications of this difference for generating as well as checking DRAT proofs. We also show that these theoretical differences have the potential to affect whether some proofs generated in practice by SAT solvers are correct or not.
Abstract. It has been an ongoing, decades-long debate about how SAT solvers and in general different or new algorithms should be evaluated and compared both in competitions and more importantly in papers. Evaluations are usually performed on existing benchmarks. Cross-validation and other means to avoid over-fitting are rarely used. In this paper we revisit the old idea of scrambling benchmarks also used in early competitions. Scrambling has the goal to make results of such evaluations more robust. We present a new method for scrambling CNFs, which allows to gradually increase the effect of scrambling, from keeping the scrambled CNF close to the original CNF, to complete random permutation of variables, clauses, and phases of literals. We used this method to scramble benchmarks from the last two SAT competitions and solved them with the best solvers in the main track of the last SAT competition. As expected our experimental results suggest that scrambling has a substantial effect on the performance of individual solvers but surprisingly has little effect on rankings among solvers. As a consequence we argue that only using our method of scrambling is not enough to increase robustness of competitions and evaluations in general.
Abstract. In this paper we present new implementation details and benchmarking results for our parallel portfolio solver TopoSAT2. In particular, we discuss ideas and implementation details for the exchange of learned clauses in a massively-parallel SAT solver which is designed to run more that 1, 000 solver threads in parallel. Furthermore, we go back to the roots of portfolio SAT solving, and discuss the impact of diversifying the solver by using different restart- , branching- and clause database management heuristics. We show that these techniques can be used to tune the solver towards different problems. However, in a case study on formulas derived from Bounded Model Checking problems we see the best performance when using a rather simple clause exchange strategy. We show details of these tests and discuss possible explanations for this phenomenon.
As computing times on massively-parallel clusters are expensive, we consider it especially interesting to share these kind of experimental results.
Abstract. Experimental data and benchmarks play a crucial role in developing new algorithms and implementations of SAT solvers. Besides comparing and evaluating solvers, they provide the basis for all kinds of experiments, for setting up hypothesis and for testing them. Currently – even though some initiatives for setting up benchmark databases have been undertaken, and the SAT Competitions provide a “standardized” collection of instances – it is hard to assemble benchmark sets with prescribed properties. Moreover, the origin of SAT instances is often not clear, and benchmark collections might contain duplicates. In this paper we suggest an approach to store meta-data information about SAT instances, and present an implementation that is capable of collecting, assessing and distributing benchmark meta-data.