I read an article on Fireeye's website the other day where they used Machine Learning to eliminate a lot of the noise that comes out of tools like strings. It's pretty interesting and looks like it would save me some time when looking through malware.
https://www.fireeye.com/blog/threat-research/2019/05/learning-to-rank-strings-output-for-speedier-malware-analysis.html
I wondered how effective freq.py scores would be in helping to eliminate the noise. 45 minutes and 29 lines of Python code later I have something that looks like it works. Check out freq_sort.py.
Before freq_sort.py here is the output of strings on a piece of malware:
student@573:~/freq$ strings -n 6 malware.exe | head -n 20
!This program cannot be run in DOS mode.
e!Rich
`.rdata
@.data
.pdata
@.gfids
@.rsrc
@.reloc
\$0u"H
L$ SVWH
K SVWH
|$ H;_
<bt%<xt!<Zt
|$ AVH
l$ VWAV
L$ SUVWH
UVWATAUAVAWH
0A_A^A]A\_^]
UVWATAUAVAWH
@A_A^A]A\_^]
After freq_sort.py the useful stings quickly bubble to the top. Its not perfect but the frequency tables are not tuned for EXE's. Some binary based frequency tables will yield better results.
student@573:~/freq$ strings -n 6 malware.exe |python3 freq_sort.py | head -n 20
Failed to convert Wflag %s using mbstowcs (invalid multibyte string)
Failed to convert pypath to ANSI (invalid multibyte string)
Failed to convert pyhome to ANSI (invalid multibyte string)
8y@:"#@<*.
WARNING: file already exists but should not: %s
opyi-windows-manifest-filename freq_server.exe.manifest
Failed to get address for PyMarshal_ReadObjectFromString
INTERNAL ERROR: cannot create temporary directory!
Failed to get address for Py_FileSystemDefaultEncoding
Failed to convert executable path to UTF-8.
Failed to get address for Py_NoUserSiteDirectory
Cannot allocate memory for ARCHIVE_STATUS
Failed to get address for PyString_FromString
Failed to get address for PyString_FromFormat
Failed to get address for Py_IgnoreEnvironmentFlag
Failed to get address for PyUnicode_FromString
Failed to get address for PyObject_SetAttrString
Failed to convert progname to wchar_t
Failed to get address for PyUnicode_FromFormat
Failed to convert %s to ShortFileName
Give it a try and tell me what you think. If you find it useful or would like some features added send me a note.
https://github.com/MarkBaggett/freq/blob/master/freq_sort.py
Mark
https://www.fireeye.com/blog/threat-research/2019/05/learning-to-rank-strings-output-for-speedier-malware-analysis.html
I wondered how effective freq.py scores would be in helping to eliminate the noise. 45 minutes and 29 lines of Python code later I have something that looks like it works. Check out freq_sort.py.
Before freq_sort.py here is the output of strings on a piece of malware:
student@573:~/freq$ strings -n 6 malware.exe | head -n 20
!This program cannot be run in DOS mode.
e!Rich
`.rdata
@.data
.pdata
@.gfids
@.rsrc
@.reloc
\$0u"H
L$ SVWH
K SVWH
|$ H;_
<bt%<xt!<Zt
|$ AVH
l$ VWAV
L$ SUVWH
UVWATAUAVAWH
0A_A^A]A\_^]
UVWATAUAVAWH
@A_A^A]A\_^]
After freq_sort.py the useful stings quickly bubble to the top. Its not perfect but the frequency tables are not tuned for EXE's. Some binary based frequency tables will yield better results.
student@573:~/freq$ strings -n 6 malware.exe |python3 freq_sort.py | head -n 20
Failed to convert Wflag %s using mbstowcs (invalid multibyte string)
Failed to convert pypath to ANSI (invalid multibyte string)
Failed to convert pyhome to ANSI (invalid multibyte string)
8y@:"#@<*.
WARNING: file already exists but should not: %s
opyi-windows-manifest-filename freq_server.exe.manifest
Failed to get address for PyMarshal_ReadObjectFromString
INTERNAL ERROR: cannot create temporary directory!
Failed to get address for Py_FileSystemDefaultEncoding
Failed to convert executable path to UTF-8.
Failed to get address for Py_NoUserSiteDirectory
Cannot allocate memory for ARCHIVE_STATUS
Failed to get address for PyString_FromString
Failed to get address for PyString_FromFormat
Failed to get address for Py_IgnoreEnvironmentFlag
Failed to get address for PyUnicode_FromString
Failed to get address for PyObject_SetAttrString
Failed to convert progname to wchar_t
Failed to get address for PyUnicode_FromFormat
Failed to convert %s to ShortFileName
Give it a try and tell me what you think. If you find it useful or would like some features added send me a note.
https://github.com/MarkBaggett/freq/blob/master/freq_sort.py
Mark