discourse/lib/text_sentinel.rb

#
# Given a string, tell us whether or not is acceptable.
#
class TextSentinel

  attr_accessor :text

  def initialize(text, opts=nil)
    @opts = opts || {}
    @text = text.to_s.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')
  end

  def self.body_sentinel(text, opts={})
    entropy = SiteSetting.body_min_entropy
    if opts[:private_message]
      scale_entropy = SiteSetting.min_private_message_post_length.to_f / SiteSetting.min_post_length.to_f
      entropy = (entropy * scale_entropy).to_i
    end
    TextSentinel.new(text, min_entropy: entropy)
  end

  def self.title_sentinel(text)
    TextSentinel.new(text,
                     min_entropy: SiteSetting.title_min_entropy,
                     max_word_length: SiteSetting.max_word_length)
  end

  # Entropy is a number of how many unique characters the string needs.
  # Non-ASCII characters are weighted heavier since they contain more "information"
  def entropy
    chars = @text.to_s.strip.split('')
    @entropy ||= chars.pack('M*'*chars.size).gsub("\n",'').split('=').uniq.size
  end

  def valid?
    @text.present? &&
    seems_meaningful? &&
    seems_pronounceable? &&
    seems_unpretentious? &&
    seems_quiet?
  end

  private

  def symbols_regex
    /[\ -\/\[-\`\:-\@\{-\~]/m
  end

  def seems_meaningful?
    # Minimum entropy if entropy check required
    @opts[:min_entropy].blank? || (entropy >= @opts[:min_entropy])
  end

  def seems_pronounceable?
    # At least some non-symbol characters
    # (We don't have a comprehensive list of symbols, but this will eliminate some noise)
    @text.gsub(symbols_regex, '').size > 0
  end

  def seems_unpretentious?
    # Don't allow super long words if there is a word length maximum
    @opts[:max_word_length].blank? || @text.split(/\s/).map(&:size).max <= @opts[:max_word_length]
  end


  def seems_quiet?
    # We don't allow all upper case content in english
    not((@text =~ /[A-Z]+/) && !(@text =~ /[^[:ascii:]]/) && (@text == @text.upcase))
  end

end
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`#`
auto replace rules in titles 2013-04-10 05:00:50 -04:00			`# Given a string, tell us whether or not is acceptable.`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`#`
			`class TextSentinel`

			`attr_accessor :text`

auto replace rules in titles 2013-04-10 05:00:50 -04:00			`def initialize(text, opts=nil)`
			`@opts = opts \|\| {}`
refactor Topic validation introduce a couple of custom validators fix minor discrepancies in tests copy I18n error message keys to default location clean up validation invocation move some responsibilities out of validator into class 2013-05-23 00:52:12 -04:00			`@text = text.to_s.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')`
auto replace rules in titles 2013-04-10 05:00:50 -04:00			`end`

refactor validators add a new setting for min pm body length use that setting for flags scale entropy check down for pms 2013-06-13 04:18:17 -04:00			`def self.body_sentinel(text, opts={})`
			`entropy = SiteSetting.body_min_entropy`
			`if opts[:private_message]`
			`scale_entropy = SiteSetting.min_private_message_post_length.to_f / SiteSetting.min_post_length.to_f`
			`entropy = (entropy * scale_entropy).to_i`
			`end`
			`TextSentinel.new(text, min_entropy: entropy)`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`end`

Enforce entropy on flag text 2013-02-08 16:55:40 -05:00			`def self.title_sentinel(text)`
remove trailing whitespaces :heart: 2013-02-25 11:42:20 -05:00			`TextSentinel.new(text,`
Enforce entropy on flag text 2013-02-08 16:55:40 -05:00			`min_entropy: SiteSetting.title_min_entropy,`
auto replace rules in titles 2013-04-10 05:00:50 -04:00			`max_word_length: SiteSetting.max_word_length)`
Enforce entropy on flag text 2013-02-08 16:55:40 -05:00			`end`

remove trailing whitespaces :heart: 2013-02-25 11:42:20 -05:00			`# Entropy is a number of how many unique characters the string needs.`
More entropy for foreign titles * Treat strings with non-ASCII characters as having more entropy 2013-06-07 14:47:07 -04:00			`# Non-ASCII characters are weighted heavier since they contain more "information"`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`def entropy`
More entropy for foreign titles * Treat strings with non-ASCII characters as having more entropy 2013-06-07 14:47:07 -04:00			`chars = @text.to_s.strip.split('')`
			`@entropy \|\|= chars.pack('M'chars.size).gsub("\n",'').split('=').uniq.size`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`end`

remove trailing whitespaces :heart: 2013-02-25 11:42:20 -05:00			`def valid?`
refactor Topic validation introduce a couple of custom validators fix minor discrepancies in tests copy I18n error message keys to default location clean up validation invocation move some responsibilities out of validator into class 2013-05-23 00:52:12 -04:00			`@text.present? &&`
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`seems_meaningful? &&`
			`seems_pronounceable? &&`
			`seems_unpretentious? &&`
Remove unnecessary anding with true 2013-06-18 01:49:10 -04:00			`seems_quiet?`
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`end`

			`private`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`def symbols_regex`
			/[\ -\/\[-\`\:-\@\{-\~]/m
			`end`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`def seems_meaningful?`
refactor Topic validation introduce a couple of custom validators fix minor discrepancies in tests copy I18n error message keys to default location clean up validation invocation move some responsibilities out of validator into class 2013-05-23 00:52:12 -04:00			`# Minimum entropy if entropy check required`
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`@opts[:min_entropy].blank? \|\| (entropy >= @opts[:min_entropy])`
			`end`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`def seems_pronounceable?`
refactor Topic validation introduce a couple of custom validators fix minor discrepancies in tests copy I18n error message keys to default location clean up validation invocation move some responsibilities out of validator into class 2013-05-23 00:52:12 -04:00			`# At least some non-symbol characters`
			`# (We don't have a comprehensive list of symbols, but this will eliminate some noise)`
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`@text.gsub(symbols_regex, '').size > 0`
			`end`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`def seems_unpretentious?`
refactor Topic validation introduce a couple of custom validators fix minor discrepancies in tests copy I18n error message keys to default location clean up validation invocation move some responsibilities out of validator into class 2013-05-23 00:52:12 -04:00			`# Don't allow super long words if there is a word length maximum`
Bad merge. Oddly not caught by autospec. 2013-05-27 10:56:55 -04:00			`@opts[:max_word_length].blank? \|\| @text.split(/\s/).map(&:size).max <= @opts[:max_word_length]`
simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`end`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00

simplify, clarify TextSentinel codeclimate pointed this out. I agree it is better to simplify and reveal intentions. 2013-05-23 13:48:37 -04:00			`def seems_quiet?`
Fix for foreign language titles: Only enforce upper case rule on english alphabet. 2013-02-14 16:09:57 -05:00			`# We don't allow all upper case content in english`
Fix to prevent check for all upper case for non-ascii messages 2013-06-17 12:22:23 -04:00			`not((@text =~ /[A-Z]+/) && !(@text =~ /[^[:ascii:]]/) && (@text == @text.upcase))`
Introduction of TextSentinel to enforce title and body quality. 2013-02-06 20:09:31 -05:00			`end`

			`end`