Archive for the ‘Data Identification’ Category
Variety in DLP Filters
Nowadays, MyDLP is trying to adopt new predefined rule patterns to its filter collection. We are working on 3 new patterns.
- Canada SIN
- France INSEE
- UK NINO
New Bayesian Classifier Engine for MyDLP
Previously, we have developed a Bayesian Classifier Engine with Java because of Turkish NLP (zemberek) dependency. But, this engine was introducing us some difficulties in many areas such as distribution, performance and maintenance.
But, a week ago we have decide to develop a very simple Turkish NLP module for MyDLP. This was a good decision because zemberek was too developed for us . We weren’t using most of its features and for every request we have to push a big binary through a thrift bridge. Also, large memory footprint of Java process was a disadvantage.
And now, we are using bayeserl with our own very simple Turkish NLP module. Moreover, results are more accurate and performance is improved.
Try it, use it.
Any comments and questions are very welcome.