-
Notifications
You must be signed in to change notification settings - Fork 1
Feedback on using Topology for Testing
Dan Debrunner edited this page Oct 3, 2017
·
7 revisions
In general, I really liked the experience and think this is the right direction for us to go into. I wanted to write this down for you before I forget.
Python
I tried to do Python first and it gook me about 1.5 days to put everything together, even with the example. The learning curve here was huge for me. Some of the challenges that I have run into:
- I needed to set up my environment in the docker image. Customers have to install Anaconda, do pip install or set up PYTHONPATH. I did not run into problem here, but it is something customer has to do if they want to use Python for testing.
- I got into a lot of weird problems because the Docker image has PYTHONPATH set and I have also done 'pip install' in my environment.
- I started out with trying to do test data generation in Python and then sending the data to the composite operator for test. I struggled with this and could not get it to work. It was not clear how to interface between the Python Code and the SPL code. I think this is going to be a challenge for any new user.
- This is being improved, in being able to pass SPL tuples into Python using a variety of mechanisms and also being more flexible in how Python values are converted to SPL
- The error messages from Python are very cryptic and hard to decipher what went wrong. (e.g. the error message I got with calling 'map' function that is not a stream?)
- Some of this will be helped by creating type hints for the api so tools like pycharm will show you are using the wrong object:
- https://github.com/IBMStreams/streamsx.topology/issues/1227
- I believe you need a deep understanding in SPL and also the Python Application API for people to figure out how to do this. (I hope the examples will help.) I have had a hard time wrapping my head, crossing between two worlds, and keeping track of things. My experience with Java and SPL helped. But I haven't done this for a long time, so it took a bit of mental gymnastic to get a hang of it. I have most trouble with SPL being such a strongly typed language, while Python is not, and understanding how the data types work when we cross boundaries.
- I really like how lightweight Python is. It's a script language for driving tests that does not need any compilation, etc.
- I also really like how the tester works and the validations.
- I got tripped when my test failed because the test fail fast. The test stops as long as a failure is detected. For example, when number of tuples do not match. But in my application I expected a lot more tuples to go through. As a result, the error message was confusing for me to figure out why my tests were failing.
- The PR below helps a little, it documents how tests fail and time out and fixes a couple of issues where a misleading error was raised when a test fails.
- https://github.com/IBMStreams/streamsx.topology/pull/1255
- I think we need Python sources that can easily read CSV data from file. I couldn't find a built-in function from our APIs to do this and had to write my own source. Most data scientist has large datasets that are in CSV files. It would be great if this is provided out of the box.
- That's a good use case to work through. When we have the SPL standard toolkit easily integrated into Python we will have a CSV writer/reader. It would be an interesting pattern to work through, may require some reusable SPL composites in a com.ibm.streamsx.testing toolkit to help out.
- Note any solution would have to work on Streaming Analytics service
- Might be an idea to also ensure the input can be from a URL that hosts the dataset, rather than having to bundle in with the test
Java
- Compared to Python, Java was much easier for me. It took me about 2 hours to set the tests up. I think part of it is because I have done it in Python and I am just writing the same thing in Java.
- I feel much more comfortable crossing the language boundary between Java and Python.
- Python allows me to have multiple conditions to check in test validation. But for Java, we can only seem to pass in one condition per test. Is it possible for Java to test for multiple conditions in one test?
- File Based Test Data and Validation - In many of our tests, our test data and also expected data are stored in files. In our test setups (both in Java and Python), there does not seem to be an easy way to do this. I understand that we may want to get away with using files. But for testing, it may be a bit hard to generate expected results in the code. It may be easier to store expected data in a file and just read them in.
Even though Python was hard to do initially, I still prefer Python to Java like you. The fact that I do not have to build my test hardness and be able to run the tests is a big plus for me.