SATisfying Solutions to Difficult Problems

93 points by atilimcetin 6 days ago

roenxi 13 hours ago

I've always been fascinated by how linear programming seems to be applicable to every problem under the sun but SAT solvers only really seem to be good at Sudoku.

In practice that are a bunch of problems that seem to be SAT, but they are either SAT at a scale where the solver still can't find a solution in any reasonable time or they turn out to not really be SAT because there is that one extra constraint that is quite difficult to encode in simple logic.

And it is unrewarding experimenting because it ends up being a day or so remembering how to use a SAT solver, then rediscovering how horrible raw sat solver interfaces are and trying to find a library that builds SAT problems from anything other than raw boolean algebra (the intros really undersell how bad the experience of using SAT solvers directly is, the DIMACS sat file format makes me think of the year 1973), then discovering that a SAT solver can't actually solve the alleged-SAT problem. Although if anyone is ever looking for a Clojure library I can recommend rolling-stones [0] as having a pleasant API to work with.

[0] https://github.com/Engelberg/rolling-stones

zero_k 6 hours ago

SAT solvers are used _everywhere_. Your local public transport is likely scheduled with it. International trains are scheduled with it. Industrial automation is scheduled with it. Your parcel is likely not only scheduled with it, but even its placement on the ship is likely optimised with it. Hell, it's even used in the deep depths of cryptocurrencies, where the most optimal block composition is computed with it. Even your friendly local nuclear reactor may have had its failure probability computed with (a variation of) it. In other words, it's being used to make your life cheaper/better/safer/easier. Google a bit around, open your eyes Neo ;)
PS: Yes, I develop a propositional SAT solver that used to be SOTA [1]. I nowadays develop a propositional model counter (for computing probabilities), that is currently SOTA [2]
[1] https://github.com/msoos/cryptominisat/ [2] https://github.com/meelgroup/ganak/
- taeric 5 hours ago
  
  I confess I would expect a lot of those would be linear programming more than sat? Mixed integer would not surprise me.
- fulafel 4 hours ago
  
  Are there open source examples of usage for real world problems, for example train scheduling or something else than software engineering practicioners might find relatable?
  - JonChesterfield an hour ago
    
    Register allocation, instruction selection and instruction scheduling can, with a degree of bloodyminded patience, all be solved with boolean SAT. That's a compiler backend.
    I like the higher level CSP more as an interface but those are _probably_ best solved by compilation to SAT. SMT also worth a look.
sirwhinesalot 12 hours ago

SAT solvers are rarely used directly, they're usually a core component of a more expressive solver type like an LCG solver or an SMT solver.
And if not that, then they are used as the core component of a custom solver that speaks some higher level domain language.
Encoding things in CNF is a pain (even with higher level APIs like PySAT). But it's not usually something you should be doing unless you are developing a solver of some sort yourself.
emil-lp 13 hours ago

> ... SAT solvers only really seem to be good at Sudoku.
This is really not true.
SAT solvers are really good these days, and many (exact) algorithms (for NP-hard problems) simply use some SOTA SAT solvers and are automatically competitive.
- roenxi 13 hours ago
  
  At doing what, though? Why are they solving the SAT problem?
  - emil-lp 13 hours ago
    
    Because you can encode many (actually all) problems as a SAT instance, and the answer to that sat instance can be translated into an answer for the original problem.
    
    tannhaeuser 12 hours ago
    
    Before you go and try to encode "all problems as SAT instance", I'd recommend to consider that SAT formulations require a fixed number of (Boolean) variables. Sure, you can use tens of thousands of helper variables to encode a problem, but at a certain point this gets unwieldy and basically all you're doing is working around SAT limitations and spend your time implementing a SAT encoder (and translator for the answer into your original domain language as you say).
    Even simple goal-directed block's world-like robotic planning problems where the number of moves and the items to pickup/putdown are variable are much easier formulated and solved using Prolog.
    
    maweki 9 hours ago
    
    Incremental automatic grounding to SAT works fine for ASP.
    
    pxx 8 hours ago
    
    "All" isn't right.
    You can only encode decision problems in NP into a SAT instance of polynomially-balanced size. Sure, that's a lot of things, but there are things provably not in this set.
egl2020 9 hours ago

"SAT solvers only really seem to be good at Sudoku": if you use conda or uv, you've used an SAT solver.
dualogy 4 hours ago

> the intros really undersell how bad the experience of using SAT solvers directly is, the DIMACS sat file format makes me think of the year 1973
"MiniZinc" is the name of the Pythonic-ish-like syntax targeting (ie on-the-fly translating to) numerous different solvers (somewhere around half-a-dozen to a dozen or so, don't recall exactly =)
- dekhn 4 hours ago
  
  I had never used SAT before (was familiar with the concept) and recently wanted to avoid thinking, so I asked gemini (after a few test prompts):
  """Create a MiniZinc script to find the optimal E12-series resistor values for the base ($R_B$) and collector ($R_C$) of a PN2222A transistor switch.
  Specifications:
  Collector Voltage Supply ($V_{CC}$): 12V DC
  Base (Input) Voltage ($V_{IN}$): 3.3V DC
  Target Collector Current ($I_C$): Maximize, but do not exceed, 1W (1 watt).
  The script must correctly model the circuit to ensure the transistor is in saturation and must execute without Gecode solver errors like 'Float::linear: Number out of limits"""
  After a few more try-paste exception loops, that generated a lovely and readable MiniZinc script with variables I can adjust for other circuits. It was exciting to see that basically every constraint problem I learned in school ("at what angle should you swim across the river..." is just a matter of encoding the problem and running a solver (I still think people should learn the basics of constraint problems, but after a certain point the problems are just tricky enough that it makes more sense to teach people how solvers work, and how to encode problems for them...")
taeric 8 hours ago

I'm curious if you have examples of problems you don't think they are good at solving? Agreed that they are not a panacea of solving problems, but if you are able to somewhat naturally reduce your problem to a SAT statement, they are silly tough to beat.
pfdietz 9 hours ago

When I've tried using SAT (or SMT) solvers I've had issues with scalability. The solution times, even if they didn't increase exponentially, tended to go up as some higher polynomial (like, cubic) in the size of the initial problems I was trying them on.
- JonChesterfield an hour ago
  
  My experience is they respond yes/no very quickly for lots of problems, but as the problem approaches "probably narrowly solvable" the runtime goes exponential.

ViscountPenguin 14 hours ago

I love SAT solvers, but way more underappreciated by software engineers are MILP solvers.

MILPs (Mixed Integer Linear Programs) are basically sets of linear constraints, along with a linear optimization functions, where your variables can either be reals or ints.

Notably, you can easily encode any SAT problem as a MILP, but it's much easier to encode optimization problems, or problems with "county" constraints as MILPs.

sirwhinesalot 14 hours ago

Both are severely underused for sure. But it didn't help that for a long time open source MILP solvers were pretty mediocre.
HiGHS didn't exist, SCIP was "non-commercial", CBC was ok but they've been having development struggles for awhile, GLPK was never even remotely close to commercial offerings.
I think if something like Gurobi or Hexaly were open source, you'd see a lot more use since both their capabilities and performance are way ahead of the open source solutions, but it was the commercial revenue that made them possible in the first place.
Using CP-SAT from Google OR-Tools as a fake MILP solver by scaling the real variables is pretty funny though and works unreasonably well (specially if the problem is highly combinatorial since there's a SAT solver powering the whole thing)
- FreakLegion 13 hours ago
  
  SCIP going Apache definitely improved the landscape, but Couenne (global MINLP), Bonmin (local MINLP), and IPOPT (local NLP, but e.g. [1] gets you MINLP) are solid and have been around for a long time. And anecdotally, I've seen a lot more issues with SCIP (presolvers and tolerances, mostly) than with other solvers. Still it's replaced Couenne in my toolbox, and Minotaur has replaced Bonmin, but IPOPT remains king of its domain.
  1. E.g. https://en.wikipedia.org/wiki/Randomized_rounding.
  - zvr 10 hours ago
    
    Many thanks to you and the parent comment for providing names to search when looking for implementations.
    A basic question, before searching these: are they "input compatible"? I mean, can a problem be formulated once and then be solved by a variety of solvers? Or does each one of them use its own input language?
    
    sirwhinesalot 8 hours ago
    
    For MILP there isn't one single standard, but multiple competing solutions.
    Nearly every solver supports the MPS format, but that's a really old format straight from the era of punchcards, it sucks.
    Many solvers support the nl format, which is a low level format spat out by the AMPL tool (commercial software for mathematical modeling).
    Many solvers support the CPLEX lp format, which is a nice human readable and writable format.
    Google OR-Tools includes an API for mathematical modeling that supports the relevant open source MIP solvers plus Gurobi I think and its own CP solver. There are Python and Julia packages that try to do the same (rather than calling the solver APIs directly they usually spit out a problem in nl format though).
    MiniZinc supports various open source MILP solvers plus various CP solvers. Very nice language, very high level.
    For MINLP the only standard I know of is OSiL but support for it is spotty, mostly supported by open source tools I think.
    
    FreakLegion 3 hours ago
    
    This is a good list --- would also add Pyomo. There's plenty of nuance to algebraic modeling languages like Pyomo and JuMP, but at base you're just writing mathematical expressions in Python for Pyomo (or in Julia for JuMP) to parse and transform into the target format. E.g. taking the objective from the Weapon Target Assignment problem (https://en.wikipedia.org/wiki/Weapon_target_assignment_probl...):
    def objective(model: WtaModel) -> SumExpression: return sum( model.target_values[t_j] * prod( model.survival_rates[w_i][t_j] ** model.target_selected[w_i, t_j] for w_i in model.weapon_types ) for t_j in model.targets )
    
    postexitus 6 hours ago
    
    Also GAMS
    
    sundarurfriend 5 hours ago
    
    JuMP [1] in Julia is one way to get that - it uses a common DSL that they gets translated to work with any of a variety [2] of solvers.
    [1] https://jump.dev/ [2] https://jump.dev/JuMP.jl/stable/installation/#Supported-solv...
  - sirwhinesalot 12 hours ago
    
    Didn't know about Randomized rounding. Is there any solver with built-in support for that? To turn a strong NLP solver into a fast but approximate MINLP solver?
    
    FreakLegion 2 hours ago
    
    Not necessarily randomized rounding in particular, but many solvers use rounding methods internally e.g. as part of a feasibility pump. Minotaur and SCIP definitely do this.
muragekibicho 14 hours ago

SATs are cool but MILPs are cooler IMO. Lol I've been trying to train a neural network over a finite field, not the reals and oh my god MILPs are God's gift to us.
- ViscountPenguin 12 hours ago
  
  Huh, that's an interesting idea.
  If you get sick of MILPs, maybe you could use a representation of your finite field instead of the field itself? That way you could do everything in C^n, and preserve differentiability to use SGD or something like it.

isolay 9 hours ago

This is interesting, but techniques like CDCL seem to only ever find any one valuation that makes a proposition true. My homegrown solver finds all valuations that make a proposition true and then can eliminate redundancies, such that `X or not X and Y` gets simplified to `X or Y`, just to mention one example (proof: truth tables are identical). Are there any other SAT solvers out there that do something like that? My own one suffers from combinatorial explosion in the simplification stage when expressions get "complex" enough. But then, the simplification is NP-complete, AFAICT.

thesz 7 hours ago
```
  > This is interesting, but techniques like CDCL seem to only ever find any one valuation that makes a proposition true.
```
Most, if not all, current SAT solvers are incremental. They allow you to add clauses on the fly.
So, when you find the solution, you can add a (long!) clause that blocks that solution from appearing again and then run solver again.
Most, if not all, SAT solvers have a command line option for that enumeration done behind the scenes, for picosat it is "picosat --all".
- sirwhinesalot 7 hours ago
  
  For problems with a very large number of solutions this quickly becomes inefficient. The blocking clauses will bog down the solver hard and waste tons of memory.
  A more clever approach is to emulate depth-first search using a stack of assumption literals. The solver still retains learned conflict clauses so it's more efficient than naive DPLL.
emil-lp 8 hours ago

It's not very difficult to turn a CDCL solver into an ALL-SAT solver, and there are many publications available doing exactly that.
- fcholf 7 hours ago
  
  I would also add that #SAT solvers, aiming at counting the number of solutions, are often implicitly solving the ALL-SAT in a more efficient manner than what you would have with modified CDCL solvers because they use other caching and decomposability techniques. Check knowledge compiler d4 for example https://github.com/crillab/d4v2 that can build some kind of circuits representing every solution in a factorized yet tractable way.

js8 9 hours ago

I don't understand why SAT solvers don't use gaussian elimination more. Every SAT problem can be represented as an intersection of linear (XORSAT) and 2SAT clauses, and the linear system can resolve some common contradictions, propagate literals, etc.

Also Grobner basis algorithm over polynomials in Z_2 can be used to solve SAT. A SAT problem can be encoded as a set of quadratic polynomials, and if the generated ideal is all polynomials, the system is unsatisfiable (that's Nullstellenansatz). I don't understand how we can get high degree polynomials when running Grobner basis algorithm that specifically prefers low degree polynomials. It intuitively doesn't make sense.

dooglius an hour ago

SAT is NP-complete, and both gaussian elimination and 2SAT are polynomial-time, so this would suggest either you're mistaken or there is some hidden catch here (like the size of one or the other being exponential-sized).
- js8 an hour ago
  
  There is no catch - I even describe the reduction in another comment below. You can convert a 3SAT clause to a combination of XORSAT and 2SAT clauses. I am not mistaken, either, I used this reduction many times on practical problems, so I know it works. I encourage you to try it.
  Unfortunately, putting the algorithms for XORSAT and 2SAT together is not trivial at all, they are quite different (but Grobner bases over GF(2) seem very promising in that).
  But I agree that the fact that both XORSAT and 2SAT have polynomial algorithms is quite a strong indicator that full SAT has a one too. :-) (On the other hand, there is IMHO only very little actual evidence for P!=NP.)
sirwhinesalot 9 hours ago

I have no idea about most of the words you wrote but I'd love to see alternative approaches to SAT solving.
CryptoMiniSAT has native support for Gaussian Elimination but it has to put a lot of effort into recovering XORs from the CNF.
A different format (XORSAT + 2SAT) plus an efficient algorithm to exchange information from the two sides of the problem would be interesting.
- zero_k 6 hours ago
  
  Author of CryptoMiniSat here :) XOR+CNF is indeed supported by CryptoMiniSat. Which is cool, but if you _really_ think about it, the resolution operator over these two are gonna give you multivariate polynomials over GF(2). So resolution is poor in CryptoMiniSat, because it only encodes one of the constraints that this polynomial implies (i.e. one that can be encoded in a single disjunctive clause). And if you wanna do the _real_ deal, i.e. "properly" solve multivariate polynomials over GF(2) then you are in for a ride -- the all-powerful, much-feared, Grobner basis algorithms, and I am not touching those with a 100m pole, because they are hell on wheels :) I mean... it's possible to contribute to them, and I know of two people who did: https://theory.stanford.edu/~barrett/fmcad/slides/5_Kaufmann... and of course, https://link.springer.com/chapter/10.1007/978-3-031-37703-7_... i.e. Daniela and Alex. It's... rough :D
  Just my 2 cents.
  - js8 5 hours ago
    
    Thanks for the links to those people. The GF(2) simplifies Grobner bases calculations a lot, IMHO, but I don't have much experience with them either. I am just curious, because to me it now seems to be an obvious way to go. I mean, the fact that we can represent any SAT problem as an intersection of 2SAT and XORSAT problems indicates, there must be some generalization of the both polynomial algorithms. And it seems to me this generalization is somehow related to Grobner bases methods.
    I have only very quickly skimmed it, but I wouldn't be surprised if the theorem D.21 in Kaufman's thesis (https://danielakaufmann.at/wp-content/uploads/2020/11/Kaufma...) turned out to be true for all the unsatisfiability PAC proofs, not just the circuits she is looking at. (As I commented below. If you're proving contradiction, looking for element 1 in the ideal using Grobner basis, then it seems somewhat unreasonable to require degree of basis polynomials larger than 3, if you start from all polynomials with degree less or equal than 2. If you look what e.g. 2SAT algorithm is doing algebraically, it only needs degree 3 polynomials as well, although the monomial of degree 3 is immediately eliminated. So if the Grobner basis algorithm needs to build large degree polynomials, because no small degree will help you, it's likely your system is already satisfiable. Would really like to see a counterexample.)
- js8 8 hours ago
  
  One way to recover XORs from 3SAT (AFAIK unfortunately DIMACS doesn't support XOR natively) is to use the formula:
  (a | b | c) = ((a ^ ~x) | b) & (x | c)
  where x is a variable that can be chosen. The LHS is satisfiable iff RHS is satisfiable. This gives you extra variable per clause, but you can eliminate some of them using GE. (Probably not an ideal approach to easy problems, but IMHO worth trying for the hard ones.)
  Then you get a set of clauses that have what I call "generalized literals" - essentially a linear combination of literals - in clauses that only have 1 OR (I call this 2-XORSAT). These can be trivially transformed to intersection of XORSAT and 2SAT.
  Another thing that DIMACS is missing, it's not very modular. So you cannot represent a bunch of conditions using already pre-solved XORSAT portion, and apply that to a new problem. (For example, we could encode a generic multiplier in SAT, relating inputs and outputs. Then we could presolve the XORSAT portion of the multiplier. Then we want to factorize a number. Well, just add a bunch of constants specifying output that needs to be satisfied and you don't need to solve the XORSAT for the multiplier again.)
  Or, you can convert the 2-XORSAT into quadratic polynomials over Z_2 that all need to be equal to 0. Then you can use the Grobner basis algorithm to find whether these polynomials can be linearly combined (using polynomials as coefficients) to give 1 (i.e. they generate an ideal which contains the whole ring of polynomials), meaning the contradictory equation 1=0 has to be satisfied, and so the problem is unsatisfiable.
  What I would really like to understand, how it can happen, if you are running the Grobner basis algorithm and keep an eye on the degree, that you build the degree up from low degree polynomials. It seems to me, if the problem is unsatisfiable, you should be always able to get by with polynomials of low degree, because why carry extra variables if you're gonna eliminate the terms anyway? (If low degree polynomials are sufficient for the Grobner basis, it would imply P=NP by the way, because there is only polynomial number of polynomials of a given maximum degree.)
  But yeah CryptoMiniSAT looks somewhat promising.
  - fcholf 7 hours ago
    
    To efficiently encode XOR-constraint you will necessarily need extra variables (as you proposed) because we know that no bounded depth and polynomial size Boolean circuit can encode parity ("parity is not in AC0").
    The problem with this kind of constraints is that they are not local and really disturb CDCL based SAT solvers because you cannot remove the constraint unless you have fixed all its variables. Now, cryptominisat has a specific focus on this kind of constraints and implements ad-hoc routines to exploit this routine. cryptominisat is hence focused toward applications where this kind of constraint naturally appears, be it circuit synthesis or cryptography.
    Now, CDCL solver is another beast. Gaussian elimination / Grobner basis are not implemented in these solvers simply because it is "too costly". One way of understanding why is just to reframe what one call SAT solver. This is not really a tool to solve SAT (hell, they can't even deal with the pigeon hole principle without preprocessing), the purpose of a SAT solver is to quickly solve instances coming from natural encoding of constraint systems. In this framework, unit propagation and clause learning are basically extremely efficient because they uncover hidden structure in the way, us human, encode constraint problems. Hence (CDCL) SAT solvers are highly efficient in many applications. But it is easy to kill one with a few variables by simply going out of this framework. For example, they are killers in solving packages dependencies because the structure of these problems matches this approach, CDCL solvers are then just an extremely refined way of bruteforcing your way through the search space.
    Now, if your application heavily contains XOR-constraints, then it is likely a (CDCL) SAT solver is not the right approach. For circuit synthesis, some people use BDD based approach (where parity can be efficiently encoded) but sometimes you will simply have to develop your dedicated solver or use other solvers for other NP-complete problem that will have a different area of specialization (as already observed in many comments).
    
    js8 6 hours ago
    
    I appreciate the answer.
BigTTYGothGF 3 hours ago

> I don't understand how we can get high degree polynomials when running Grobner basis algorithm
From the syzygies, surely?
- js8 3 hours ago
  
  Surely. Maybe you can give me a counterexample. Let's just consider GF(2). Give me a set of generators with polynomials of degree <=2 over n variables, such that, 1 is a member of the ideal (i.e. generators generate the whole ring of polynomials) - so if I understand this correctly, Grobner basis is just 1, and, during the construction of the Grobner basis, you need to construct a polynomial of an arbitrarily high degree (or let's say degree 7). And there is no way to avoid this (or same degree) polynomial during the basis construction by finding a lower degree polynomial first.
  See my problem? Somehow, I feel these syzygies need to be constructed from the degree 2 polynomials (the initial set), but then we can probably only work with the polynomials of the low degree instead, from which those syzygies are constructed.
  To me, this is a very interesting question. Because if we can find the Grobner basis (even for just those ideals that coincide with the whole ring) in GF(2) in a way that we only need to construct polynomials of bounded degree, then there is a polynomial algorithm for SAT. Since most people believe there isn't such algorithm, I would really like to see a counterexample to this situation.
emil-lp 8 hours ago

> I don't understand why SAT solvers don't use gaussian elimination more.
These are the types of questions that can easily be answered by simply trying.
- js8 8 hours ago
  
  I agree they can be answered, but easily - no. The Grobner basis algorithm has been known for longer than NP problems, so it's not obvious why it wasn't used to solve SAT problems.

fjfaase 12 hours ago

If you convert a sudoku to an exact cover you can usually solve it by finding colums that are a subset from another column and remove all rows that are only in one of them.

Sudokus that can be solved with reasoning alone, can be solved in polynomial time.

I recently discovered that solving exact covers, and probably also with SAT, using a generic strategy, does not always result in the most efficient way for finding solutions. There are problems that have 10^50 solutions, yer finding a solution can take a long time.

zkmon 14 hours ago

Problems, including NP-complete ones, are only a product of the way you look at them and the reference frame from where you look at them. They get their incarnation only out of the observer's context.

jgalt212 9 hours ago

At PyCon 2019, Raymond Hettinger did a nice talk on Python interfaces to a variety of solvers.

https://www.youtube.com/watch?v=_GP9OpZPUYc

mxkopy 10 hours ago

Slightly related, there’s ways to differentiate linear programs (https://github.com/cvxpy/cvxpylayers), which might allow one to endow deep neural networks with some similar reasoning capabilities as these sorts of solvers.