Skip to main content

Welcome to Mark Baggett - In Depth Defense

I am the course Author of SANS SEC573 Automating Information Security with Python. Check back frequently for updated tools and articles related to course material.




What are the last 4 digits of your SSN?

Note to readers:  This very old blog entry still gets a pretty high amount of traffic.  I rarely, but sometimes do get nasty emails from people telling me that I am "teaching people to steal identities".   If we assume that I am the only person in the world capable of reading the instructions published on the Social Security Administrations (SSA) website, that explain the same thing I have here, then I agree with you.  But if that were true then people not smart enough to understand the SSA's website would probably be stumped by this as well, so we are still safe.   If I was the only one who could understand that website that might make me the smartest person in the world!! (in which case who are you to question my wisdom?)  I asked my wife about the possibility that I was the smarted person in the world and she confirmed for me that I am not (she is still laughing for some reason).  
The purpose of this site is to make people aware of the danger of simply sharing the last 4 digits of your SSN.   When a company asks you for your last 4 digits,  Tell them no and send them a copy of this site.   Your SSN is a form of identification (like a account name), not authentication (like a password).   The problem is that many places today use a very predictable piece of data to authenticate who you are (ie your SSN as a 'password').  Now that you know it is predictable (which won't change whether this blog exists or not) you can fight to change the way organizations use it (which could change with an educated populous).   Screaming fire in a crowded movie theater is a good thing when the building is on fire.  Don't shoot the messenger.

Note 2:  Because people are aware of how easily SSNs could be predicted the SSA changed the way they issued numbers in 2011.   If you were issued a SSN after 2011 then the information below does not apply to you.  Your SSN can not be predicted based upon your geography and birth date.   Even so, your SSN is only a form of identification (like your name), it is not a form of authentication (like a password).  Anyone using your SSN as a password, security code or something else in an attempt to prove who you are is misusing it.  Here is a link to the new policy for issuing SSNs:  https://www.ssa.gov/employer/randomization.html    If you received a SSN prior to 2011 then the information below is still relevant to you.

Follow me on Twitter for information security and hacker news:  @MarkBaggett

Original blog entry:
“What are the last 4 digits of your SSN?” Nowadays, it seems to be accepted as a standard question to validate your identity. But throw in “What is your date of birth” and “What is your birth place?” and you may have given away your identity. I don't think it would be uncommon to find those three questions asked together in many cognitive password reset systems. Last week I answered the question with my bank and it made me wonder how predictable is my SSN with the rest of the information my bank has on me. I did a little research and sure enough, it seems feasible to me that with a few pieces of info and your last four an attacker could reasonable predict your SSN. The number of permutations are certainly low enough to make a brute force attack feasible.

First of all lets clarify something. The question “Where were you born?” is probably a good indicator of the actual question that needs to be asked which is “In what state did you apply for a SSN?” And “What is your birth date?” is not as accurate as “What date did you apply for a SSN?” But in a non-scientific polling of people I have asked it seems that your probably close enough.

Now lets look at predicting your SSN…

Your SSN is in the following format AAA-BB-CCCC. AAA is a number that represents the state in which you applied for the SSN. These numbers well documented and available on the Social Security Administrations website. For example, Were you born in Nevada? Your SSN starts with 530. New Mexico? 525. Most states have a range of a few digits. But lets say you were issued your SSN in New Mexico and you gave me your last 4; with no other information it will only require 99 guesses to guarantee I will predict your SSN.. In 1973 these numbers became even more closely tied to your geography. Now all number are issued by the central office in Baltimore based upon the ZIPCODE of the submitter. So those numbers can be predicted based upon the date your SSN was applied for and/or your zip code. But brute forcing 99 whole possibilities, that could take a while. But perhaps its even easier than that.

The second group of digits (BB) are handed out in a semi-sequential, but still chronological order. Therefore with the correct insight into which numbers where issued at what time you could predict this information. A good explanation of how these numbers are issued is in the “GROUP NUMBER” section on this site. http://www.usrecordsearch.com/ssn.htm

So what would it take to build a database of middle number and when they were issued? Well, looks like the SSA has already done that for us and published it on their website. They have what they refer to as the “High group number”. Every month they predict what the highest middle digits are for each of the geographic codes. The numbers can be found here…

http://www.ssa.gov/employer/ssnvhighgroup.htm

So in April 2006 the middle digits for the first three state codes (born in New Hampshire) were :
001 (First 3 digits) = 04 (Middle 2 digits)
002 = 02
003  = 02

Then in May of 2006 they became
001=04
002=04 (the next group according to their sequence)
003=02

In October of 2006 geographic code 003 began issuing number with 04 as the middle two
001=04
002=04
003=04

In May of 2007 geographic code 001 began issuing number with 06 as the middle two.
001=06
002=04
003=04

Today the history of "high groups" only date back to November 2003 on the main website. But 4 years seems to be long enough to determine how quickly the digits in various geographical areas change. That information combined with data from other public sources such as the number of births in a state in a given year would be helpful in establishing a prediction database.

Reading these descriptions it is obvious that numbers are issued chronologically based upon geography of the requester. So how difficult would it be for a computer to either accurately predict or come reasonably close such that a brute force is reasonable.


Here is a table of states and SSN geographic codes

001-003 NH 400-407 KY 530 NV
004-007 ME 408-415 TN 531-539 WA
008-009 VT 416-424 AL 540-544 OR
010-034 MA 425-428 MS 545-573 CA
035-039 RI 429-432 AR 574 AK
040-049 CT 433-439 LA 575-576 HI
050-134 NY 440-448 OK 577-579 DC
135-158 NJ 449-467 TX 580 VI Virgin Islands
159-211 PA 468-477 MN 581-584 PR Puerto Rico
212-220 MD 478-485 IA 585 NM
221-222 DE 486-500 MO 586 PI Pacific Islands*
223-231 VA 501-502 ND 587-588 MS
232-236 WV 503-504 SD 589-595 FL
237-246 NC 505-508 NE 596-599 PR Puerto Rico
247-251 SC 509-515 KS 600-601 AZ
252-260 GA 516-517 MT 602-626 CA
261-267 FL 518-519 ID 627-645 TX
268-302 OH 520 WY 646-647 UT
303-317 IN 521-524 CO 648-649 NM
318-361 IL 525 NM *Guam, American Samoa,
362-386 MI 526-527 AZ Philippine Islands,
387-399 WI 528-529 UT Northern Mariana Islands

650-699 unassigned, for future use
700-728 Railroad workers through 1963, then discontinued
729-799 unassigned, for future use
800-999 not valid SSNs. Some sources have claimed that numbers
above 900 were used when some state programs were converted
to federal control, but current SSA documents claim no
numbers above 799 have ever been used.


http://www.usrecordsearch.com/ssn.htm
http://en.wikipedia.org/wiki/Social_Security_number
http://www.ssa.gov/employer/ssnvhighgroup.htm

Popular posts from this blog

Awesome Keyboard Tricks - Clevo/Sager Backlight control from Powershell

I'm back on Windows.   After 8 years on a Macintosh I just couldn't go another day with ONLY 16GB of RAM.   I priced it out and for the cost of a top of the line MacBook I could get a tricked out PC with 32GB of ram and 2.5 TB or hard drive space (1.5 of it being SSD).   So I made the switch.  To get a top performing laptop I ended up buying a gaming machine from xoticpc.com.   The model is Sager NP9752 ( Clevo P750ZM ).    I have to say I like it quite a bit.    One of the features I was curious about was the "Programmable backlit keyboard".   With it you can set your keyboard backlight to various colors and light movement patterns.    Now, when I hear "programmable" I think APIs.   I was a little disappointed to find out there weren't any documented APIs that I could use to control the keyboard.    Your only choice is to use their built in tool to configure the lights on the keyboard.   That stinks.  I want to be able to change key colors automatically

SRUM-DUMP and SRUM_DUMP_CSV Ported to Python 3

SRUM_DUMP and SRUM_DUMP_CSV have been ported to Python3 and are available for download from the PYTHON3 branch of my github page. https://github.com/MarkBaggett/srum-dump/tree/python3 In moving to Python3 I also updated the modules that I depend upon to parse and create XLSX files and access the ESE database that contains the SRUM data.  I hope that this will fix the issue that some users have experienced with SRUDB.dat files that create very large spreadsheets.  If it does not please let me know and continue to use SRUM_DUMP_CSV.EXE to avoid the XLSX problem. In moving to Python3 you will find the process to be faster. If you would like to run the tools from source instructions for doing so are in the README on the github page.

New tool Freq_sort.py

I read an article on Fireeye's website the other day where they used Machine Learning to eliminate a lot of the noise that comes out of tools like strings.  It's pretty interesting and looks like it would save me some time when looking through malware. https://www.fireeye.com/blog/threat-research/2019/05/learning-to-rank-strings-output-for-speedier-malware-analysis.html I wondered how effective freq.py scores would be in helping to eliminate the noise.  45 minutes and 29 lines of Python code later I have something that looks like it works.  Check out freq_sort.py. Before freq_sort.py here is the output of strings on a piece of malware: student@573:~/freq$ strings -n 6 malware.exe | head -n 20 !This program cannot be run in DOS mode. e!Rich `.rdata @.data .pdata @.gfids @.rsrc @.reloc \$0u"H L$ SVWH K SVWH |$ H;_ <bt%<xt!<Zt |$ AVH l$ VWAV L$ SUVWH UVWATAUAVAWH 0A_A^A]A\_^] UVWATAUAVAWH @A_A^A]A\_^] After freq_sort.py the useful stings quickly bubble to t

Security Onion getting the most from Freq.py and Domain States

My talk at Security Onion conference has been posted and is available for viewing here.

SRUM DUMP and SRUM DUMP CSV Updated

An issue was reported where is some conditions SRUM_DUMP would stop processing and print the following error to the screen. UnboundLocalError: local variable 'sid_str' referenced before assignment The issue was that sometimes the SRUM database had entries in it that were all zeros. OrderedDict([('IdType', 3), ('IdIndex', 38127), ('IdBlob', '0000000000000000')]) I've released an update that handles the anomoly althought I do not understand the circomstances of why Windows would record all zero's for as the user SID. The issue was fixed and new versions of both SRUM DUMP and SRUM DUMP CSV were released.