this post was submitted on 29 Apr 2024
323 points (100.0% liked)
Programming
17509 readers
380 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I don't know how you could make a server faster than with flask honestly (which I guess is similar to spacy ?), so these results are very surprising. Especially against java, java is so freaking verbose.
It was a mixture of factors.
Data was to be dumped into a S3 bucket (minio), this created an event and anouther team had built an orchestrator which would do a couple of things but eventually supply an endpoint a reference to a plain/txt file for analysis.
For the Java devs they had to
[modify the example camel docs.](https://camel.apache.org/manual/rest-dsl.html)
and use the built in jackson library to convert the incoming object to a class. This used the default AWS S3 api to create a stream handle which fed into the OpenNLP docs. .The Python project first hit a wall in setting up Flask. They followed the instructions and it didn't work from setup tools. The Java team had just created a new maven project from the Intelij but the same approach didn't work for the Python team using pycharm. It lost them a couple days, I helped them overcome it.
Then they hit a wall with Boto3, the team expected to stream data but Boto3 only supports downloading, there was also a complexity issue the AWS SDK in Java waa about 20 lines to setup and a single line to call, it was about 50 lines in Python. On the positive side I got to explain what all the config meant in S3.
This caused the team anouther few days of delay because the team knew I used a 350MiB Samsung TV guide to test the robustness and had to go learn about Docker volume mounting and they thought they needed a stateful kubernetes service and I had to explain why that was wrong.
Basically Python threw up a lot of additional complexity and the docs weren't as helpful as they could have been.
I am not familiar with AWS apis. Probably Java was more suitable for this task then, as it often comes down to how good the documentation and tutorials are for the task and how many people use it. Like you said bad documentation is a big issue.
I can also see how they could struggle with python dependencies, which I guess is the reason why they struggled setting up flask. With a java project you can just nuke everything from scratch and reinstall the correct compatible packages. But with python global dependencies (without env) you cannot get around the issue like that. That makes it harder for beginners. (But for intermediate give me anything but maven please).
Maybe Python flask wasn't't the tool for the task here, but I still think it's overall better and especially faster to use for most things.