If you landed here without reading the first blog post of the series, perhaps it’s a good idea to check it out before proceeding, otherwise, happy reading!
Next question, please!
What Frothly VPN user generated the most traffic? Answer guidance: Provide the VPN user name.
BOTSv3 question #330, 1000 points
Now it’s a good time to list which data sources are available from the dataset, since we would need a bit more context to get around this one. Once we find a target sourcetype, we drilldown from there.
The following approach is also useful for reporting on data availability and other volume related metrics. The key command here is tstats.
Which data sources are available from the dataset?
Even though it’s listed at the dataset’s GitHub page, this is more didactic:
| tstats count where index=botsv3 by sourcetype
| sort - count

Just make sure you select “All time” from the Time Picker and optionally select an overlay (scale) from the count column so that it gives you a better visual representation.
If you are not familiar with tstats or indexed fields, Google has suggested me the following blog post from the folks at Deductiv (US Splunk partner?) which seems like a good intro on the command:
Fun (or Less Agony) with Splunk Tstats
The important thing to know is that above query will run very fast while reaching the metadata from all events and providing us with a very good view on potential sourcetypes to narrow our search down.
There are over 100 sourcetypes and the question asks for a VPN username. That means any Firewall, VPN or Remote Access related data sources are potential candidates here. But I’ve got another idea.
Is it possible to list all sourcetypes containing a successful extracted field name? Oh yes! In this case, we should consider fields storing account names as well as information on traffic volume.
For the experienced Splunker, and following Splunk best practices on normalization, we are likely talking about user (or src_user) and bytes (or bytes_out) fields.
“Splunk, show me all sourcetypes with either one or the other (field)!”

The results demonstrate that only 22 sourcetypes carry user and/or bytes info within their events. The EVAL’s coalesce() function assigns the first non-null value seen from the list of parameters, that means user is assigned with either user or src_user value. Same goes for bytes.
So now it’s easier to select our candidates. And the first guess is pretty obvious: cisco-asa.
Unless you want to check stream:* first, Cisco ASA seems like the best choice since it’s a known Network device and it even provides info on the 2 fields we are looking for (check user_dcount and bytes_sum values).
What’s in Cisco Asa events?

So apparently, those events are generated once a connection is finished (action=”teardown”) and it also contains user and traffic volume info. Looks good for stacking them up by user, right? Go!
The Answer

So user account mkraeusen makes the top, therefore being the best guess, which can be confirmed as the right one from the answers spreadsheet.
Note: BOTS is also a way to promote Splunk Community and Premium Apps, so this question, like many others, may answered by checking predefined dashboards and other ES interactive panels.
Until next question!