this post was submitted on 21 Jun 2024
423 points (99.1% liked)

Software Gore

965 readers
1 users here now

A community for posting software malfunctions

Deliberately bad software or bad design is not software gore, it must be something unintentional

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient and shear it



founded 1 year ago
MODERATORS
 
all 50 comments
sorted by: hot top controversial new old
[–] [email protected] 72 points 4 months ago* (last edited 4 months ago) (5 children)

Those (?=...) bits are positive lookahead assertions:

Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.

The one (?!...) is a negative lookahead assertion.

The $& var doesn't really matter outside of Perl. It contains the text of the pattern you just matched, but even within Perl, capture groups are preferred. Once used at all, it will slow down your program every time a new regex is hit, which is especially bad in long running web server environments. Gets used sometimes in short scripts, though.

What really matters is that the lookaheads don't consume any text. In other words, the pointer that shows where in the text we are doesn't increment; once we're outside of the lookahead, we're still right back in the same place.

So let's break this down using the /x modifier to make it somewhat sane.

/^
(?!.*\s) # no whitespace allowed
(?=.{8,256}$) # between 8 and 256 characters (the '$' here indicating the end of the string)
(?=.*[a-z]) # has to be a lowercase ASCII alphabet char somewhere
(?=.*[A-Z]) # has to be an uppercase ASCII alphabet char somewhere
( # need a number, or a list of special chars on a US keyboard
    (?=.*[0-9]) 
    | (?=.*[~!@#$%^&*()-=_+[\]{}|;:,./<>?])
)
.* # consumes the whole string
$/x

Notes:

  • Doesn't make any allowances for non-English characters, or even non-US characters (like the "£" character in the UK)
  • There's a whole slew of utf8 characters out there that should count towards "special characters", but aren't considered here
  • There's no reason to deny whitespace; let people use passphrases if they want (but then, you also don't want to block those people for not using symbols)
  • Putting a limit at 256 is questionable, but may not necessarily be wrong

That last one has some nuance. We often say you shouldn't put any upper limit, but that's generally not true in the real world. You don't want someone flooding an indefinite amount of data into any field, password or not. A large limit like this is defensible.

Also, lots of devs are surprised to learn that bcrypt and scrypt have a length limit of 72 bytes. A way around this is to run your input through SHA256 before giving it to bcrypt or scrypt.

[–] [email protected] 11 points 4 months ago (1 children)

Honestly, white space is a character, and adds extra entropy to passwords. I do not understand why people do not want to promote using white space in passwords/passphrases. If I’m missing something intrinsically bad about white space in passwords, I’d love to know.

[–] [email protected] 5 points 4 months ago (1 children)

You're right on. As long as you're otherwise following best practices for storing passwords, there's no downside.

[–] [email protected] 12 points 4 months ago (1 children)

But but but if I add it to the queryparams for my rest endpoint the space will break my URL!

[–] [email protected] 3 points 4 months ago

Always urlencode your passwords!

Wait, that doesn't seem right...

[–] [email protected] 10 points 4 months ago

As someone who spent many years as a Perl developer, I immediately recognized the incantations to the regex gods of old, heh. Great explanation!

[–] [email protected] 10 points 4 months ago* (last edited 4 months ago) (1 children)
load more comments (2 replies)
[–] [email protected] 63 points 4 months ago (1 children)

password must be valid regex

[–] [email protected] 14 points 4 months ago

Bets on what percentage of users on that site have that exact regex string as their password? 10%?

Comedy answer: this is one of those sites that doesn't let 2 people use the same password so it's only 1 person

[–] [email protected] 60 points 4 months ago (6 children)

I use a password manager with a random password generator. It's always disconcerting when I find a website that finds my passwords to be too complicated. Like "you can't use more than eight characters and the only special characters you can use are @ and !". What the shit?!?

[–] [email protected] 17 points 4 months ago (1 children)

We have a system that mails your password if you change it. It's just for internal users, but still.

[–] [email protected] 13 points 4 months ago (1 children)

That means those suckers are either stored plaintext or stored with decryption key that is somewhere within the server. Yeesh.

[–] [email protected] 11 points 4 months ago

"if you change it". It might send the email before storing it as a salted hash in the DB. Unlikely, but possible.

[–] [email protected] 13 points 4 months ago

"you may only use characters that we can store in a plaintext SQL field"

Oh man I fuckin hate that shit

[–] [email protected] 11 points 4 months ago* (last edited 4 months ago)

generate 32-char-pw -> "Must not be longer than 20" 🤨

generate 32-char-pw -> "you must include a specific special character" 🤨

below 10 characters is truly atrocious - and thankfully rare

[–] [email protected] 8 points 4 months ago (1 children)

Typically, the account creation will fail without saying why.

Is it because the site is broken? Because I already have an account? Because I used too weird a password? (10 minutes later) ok, it's because it's coded by idiots and it can't handle a 24 character password but a 12 character one works.

[–] [email protected] 9 points 4 months ago (1 children)

I once experienced a site just silently truncating a password that was too long. Such a ridiculous thing to do. It was several years ago, gaming related. I think it might have been Ubisoft, but I'm not sure that I'm remembering that correctly.

[–] [email protected] 4 points 4 months ago

I'm sure that it silently happens a lot.

[–] [email protected] 2 points 4 months ago

I only remember that happening once, but it wasn't some random super small site, it was Uplay. I think the limit was 14 characters, or maybe 16 I'm not quite sure, but either way it was utterly stupid.

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago) (1 children)

Yeah! Why can't I use a base64 representation of a pirated 4k TS copy of Jon Favreau's "Chef" as my password? /s

Jokes aside, I've heard some hashing algorithms have a high cap of like 20 characters, so developers are probably just too lazy to switch them out or to read the docs on how to properly use said algorithms. Either way it's a very bad sign, maybe just a tad better than them emailing you the password in cleartext.

[–] [email protected] 6 points 4 months ago (1 children)

The worst I have seen recently is one with an eight character limit and support for only four specific special characters. I didn't test if it was cap sensitive but it wouldn't shock me if it was not. It is the invoicing portal for one of my clients. I wish that was the only technical atrocity committed by that abomination...it is not.

[–] [email protected] 1 points 4 months ago

My work only recently did away with the requirement for passwords to be exactly 8 characters. This was due to the use of legacy mainframes afaik.

[–] [email protected] 31 points 4 months ago* (last edited 4 months ago)

Explanation:

That is a the regex string for that sites password field. Regex is a sequence of characters used to see if an input matches a defined pattern to validate the input in code (theres also other uses but thats what being done here). Sites normally dont show the regex pattern since it is pain to parse even if you know how to write things in regex and to people who dont code this looks like a random output. Im assuming a bug exists that prints out the wrong error string so that this shows instead of the human readable one

[–] [email protected] 24 points 4 months ago

When the developer just passes on the js error to the user 👍🏾

[–] [email protected] 19 points 4 months ago (1 children)

I know regex!

This means:

  • Must not contain whitespace

  • Must contain lowercase latin letter

  • Must contain uppercase latin letter

  • Must contain a number

  • Must contain one of the symbols you'd normally be able to type on US keyboard !@#$%^&*()-=_+[\]{}|;:,./<>?

It is a cursed way to do validation, though.

[–] [email protected] 7 points 4 months ago

Technically just needs a number or a special character, there's a | between the lookaheads for numbers and special characters.

[–] [email protected] 11 points 4 months ago (1 children)

Kind of related question: why are no whitespaces allowed in many passwords while special characters are? I'm a huge fan of elaborate nonsense sentence passphrases but get shot down.

(I ask cause that regex has that requirement it seems)

[–] [email protected] 8 points 4 months ago (1 children)

I have no idea if this is true or not but I was told it harkens back to very early multi-user operating systems where user credentials were stored unencrypted in plaintext files that used white space as delimiters.

I tend to believe this might be accurate because I learned programming back in the 1980’s on an Onyx Systems microcomputer. There was a bug that some of us learned about in its rudimentary email program that would dump you into its otherwise-protected system directory. In that directory was a file containing both usernames & passwords in clear text. I don’t recall if it used white space as a delimiter, but given everything was in clear text and not encrypted I think that might have been the case.

[–] [email protected] 7 points 4 months ago (3 children)

Oh boy, having done data science work with government files, you remind me that they still use terrible delimiters. A white space delimiter sounds significantly worse than a tab delimited file, though!

[–] [email protected] 3 points 4 months ago (1 children)

I never use tab delimiters but thinking about it, it is much less common to encounter a tab character in a CSV field than a comma...

[–] [email protected] 2 points 4 months ago (1 children)

Tabs are also usually not allowed in many fields. The thing is, tab delimiters are fine, but the data sets often get stored without file extensions. Let me tell ya, I was the only person on staff to even know what a file extension was, let alone how to load it into software that can process tab delimiters!

[–] [email protected] 2 points 4 months ago

Ugh. Bless you

[–] [email protected] 2 points 4 months ago* (last edited 4 months ago)

I learned COBOL programming on that system. COBOL’s sequential file data type was all about space-delimited text files. Part of a program would define the various input & output files. For example a numerical userid might take up columns 1-8 then the first initial would be in column 10 then the last name in columns 12-20 and so on…

[–] [email protected] 2 points 4 months ago

Tabs are considered white space. A white space is technically any character that is not visible. That covers things like spaces, vertical/horizontal tabs, non-breaking spaces, zero-length spaces, etc.

[–] [email protected] 11 points 4 months ago

Reminds me of The Password Game…

Warning: you have been warned!

[–] [email protected] 8 points 4 months ago* (last edited 4 months ago)

Jesus, what a terrible regex. I love regexes and use them frequently, but you could just, y'know, declare your requirements and then check they're being met using string methods. Min length 8, max length 256, one set/dict/map for each character class, the minimum count for each character class, and then loop over the string and check that your declared requirements are being met. A regex might be faster (if the regex engine isn't being asked to do crazy lookup shit), but why torture yourself? Just parsing the string is also nice because it's readable and makes frontend documentation easier to generate.

Or skip all of this shit and just require longer passwords. My company has mandated 16 character passwords with no character class requirements for years and it's great. Want to use a password manager? You're set. You a big fan of passphrases? correct_horse_battery_staple your way through that shit. A long password + 2FA is all you need for security.

edit: also fuck you apparently if you want to have a ñ or ü or (⁠・⁠o⁠・⁠;⁠) in your password. I'm guessing the database column for this only supports ASCII? Smells like smelly MySQL/mariaDB to me.

edit: well, Unicode might be allowed. I get turned around with all of the groups and references. I guess it also depends on how the regex is being compiled. I know that in Python you can pass a bitwise flag to re.compile to force ASCII.

[–] [email protected] 7 points 4 months ago

"Oh fuck it, it's already 4 PM on Friday."

[–] [email protected] 4 points 4 months ago

copy-pastes the text provided

[–] [email protected] 3 points 4 months ago

Stand back, boys.. cracks knuckles I've played Fallout 3 & 4!

[–] [email protected] 3 points 4 months ago

Fuck this website.

[–] [email protected] 3 points 4 months ago

well clearly your password does not match that

[–] [email protected] 2 points 4 months ago

Yeah? What's not clicking?

[–] [email protected] 2 points 4 months ago (3 children)

I should add this as a default option in my "reverse"-regex text generator library. It'd be neat to have a cli tool for generating random passwords.

(Please respond with your favorite bash/python/powershell one liner for doing this.)

[–] [email protected] 3 points 4 months ago

I use this when I don’t have Bitwarden generator available:

openssl rand -base64 64

[–] [email protected] 2 points 4 months ago
tr -d '\n' < /dev/random | head -c256 | LANG=C sed 's/[^\x21-\x7E]//g' | head -c3

If you can figure out how to make sed stop after it outputs a specific number of characters, the head -c256 can be dropped.

[–] [email protected] 2 points 4 months ago (1 children)

Ooh, you have a library that generates text to match regexes? I'd be interested to see it! That's something I've actually had a need for. Hypothesis has something like that for property-based testing, but I couldn't make use of it in the context I needed it.

[–] [email protected] 3 points 4 months ago

I do, and people do seem to use it for testing (hilariously not a use case I'd initially considered when writing it), but I'm pretty lax about maintaining it. The dependencies I (today) noticed are quite out of date and non-trivial to update. If you'd like to check it out, it's here: https://crates.io/crates/regex_generate/0.2.3

If you'd like a more updated version, there are a few forks but also someone seems to have taken the concept and run a little farther with it: https://crates.io/crates/rand_regex

That one seems more explicitly for testing and might be suitable to your needs. These are both Rust crates but should be usable from any language with a C compatible FFI.