If we are to use should_grp_score_cos(x,y) as a filter the the following
relationship must hold (from least to most expensive):
should_grp_score_len(x,y)
>= should_grp_score_cos(x,y)
>= grp_score(x)
should_grp_score_cos(x,y) wasn't holding up its part of the bargain, so
real data was used to generate a fudge curve to bring
should_grp_score_cos(x,y) results into the same space. Really this is a
terrible hack and the problem needs more thought. Evaluation of
should_grp_score_cos(x,y)'s performance benefit (given the relaxation of
the filter under the fudge curve) is sorely needed.
* License: LGPL
* Author: Andrew Jeffery <andrew@aj.id.au>
*
* License: LGPL
* Author: Andrew Jeffery <andrew@aj.id.au>
*
- * Ccanlint:
- * tests_pass FAIL
- * tests_pass_without_features FAIL
- *
* Example:
* FILE *f;
* char *buf;
* Example:
* FILE *f;
* char *buf;
/* Low-cost filter functions */
/* Low-cost filter functions */
+static inline double
+cossim_correction(const double s)
+{
+ return -((s - 0.5) * (s - 0.5)) + 0.33;
+}
+
static inline bool
should_grp_score_cos(const struct strgrp *const ctx,
struct strgrp_grp *const grp, const char *const str) {
static inline bool
should_grp_score_cos(const struct strgrp *const ctx,
struct strgrp_grp *const grp, const char *const str) {
- return ctx->threshold <= strcossim(ctx->pop, grp->pop);
+ const double s1 = strcossim(ctx->pop, grp->pop);
+ const double s2 = s1 + cossim_correction(s1);
+ return ctx->threshold <= s2;