possible way to use it with dataframes? #187
Replies: 3 comments 15 replies
-
So exciting.
|
Beta Was this translation helpful? Give feedback.
-
Hello, I was digging into this trying to isolate the problem that part of the problem might be how the 'in' operator is mapped to z3. Crosshair watch returns something on this function after 55 seconds, meaning (if I understand correctly this part) it takes that time to generate an example given the pre-conditions. Because post-conditions are never met it should fail in the first example. This happens also with dictionaries and Sets. def keys_in_an_array(arr: List[str]) -> bool:
"""
pre: "a" in arr
pre: "b" in arr
pre: "c" in arr
pre: "d" in arr
pre: "e" in arr
post: __return__ == True
"""
return False if "a" in arr else True But solving the same constraints in z3 takes 0.1s In [15]: import z3
...: import time
...: start_time = time.time()
...: s = z3.Solver()
...: sseq = z3.Const('sseq', z3.SeqSort(z3.StringSort()))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("a"))))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("b"))))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("c"))))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("d"))))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("e"))))
...: s.add(z3.Contains(sseq, z3.Unit(z3.StringVal("f"))))
...: s.check()
...: print(s.model())
...: print("--- %s seconds ---" % (time.time() - start_time))
[sseq = Concat(Unit("c"),
Concat(Unit(""),
Concat(Unit("f"),
Concat(Unit(""),
Concat(Unit("d"),
Concat(Unit(""),
Concat(Unit("e"),
Concat(Unit(""),
Concat(Unit("a"),
Concat(Unit(""),
Unit("b")))))))))))]
--- 0.01843857765197754 seconds --- Sorry if I misunderstood something, I am still trying to get my head around the internals of it. |
Beta Was this translation helpful? Give feedback.
-
Side thread: Have you ever tried anything like strictly-typed-pandas? Or, perhaps you'd invent your own more-strongly-typed dataframes? Having the actual columns and types at generation time would help a lot. You can hook into how crosshair generates values from a type with
Then also you could dispense with many of those fiddly preconditions checking column names and types; e.g.:
And CrossHair will run this quite fast, I think. |
Beta Was this translation helpful? Give feedback.
-
Hello.
I'm trying to make this tool work on dataframes (pandas, pyspark, polars, etc...) as most of the code that involve this abstraction tends to be a super buggy and tedious to test.
My first idea was to implement a very small subset of the api in pure python (as suggested in plugins documentation). I have tried a POC using a
DfSymbolicInternal = Dict[str, Union[List[int], ..]
and it kind of works. The problem is that for operations involving joins is not usable because it spends more than a minute even for the most trivial checks:I also notice that it only uses one core, but I'm not sure how parallel this can be.
It is possible to speed this up just changing the formulation of the preconditions somehow or it would be necessary z3 and implement an specific modulo theory for dataframes? What would be your recommendation to try to solve this?
Beta Was this translation helpful? Give feedback.
All reactions