Workaround for 1,000+ Entries in Global Artifacts?

Currently putting together a workflow where we pass data from specific types of outbound e-mails to a global artifact, and I was wondering if there was a way to work around the 1,000 entry lookup limit if we were to run a query on it. We’d like to be able to return a list of results from that global artifact, but given the volume of entries going into the artifact I suspect it’s always going to be over 1,000. Ideally we’d like to be able to search all entries for relevant data, but to my knowledge there’s no way to search all of them once you pass 1,000.

Is there any way to work around this? Or is the limit on global artifacts going to make querying any entries past 1,000 impossible at this point?

I wouldn’t use GA at that point with that much data.

Can you save the data to a google sheet or to a small DB of some sort?

@valen_arnette how many entries are you expecting?

@tyler_terenzoni It’s hard to estimate exactly how many, but I know it’d far exceed 1,000. I set up a global artifact as a test Wednesday of last week and it’s already at 1,125 entries today.

Considering it’s data from certain employee e-mails, I don’t think a Google Sheet would be the right call. An internal DB might be a better option, but I’m unsure what the right plugin/setup would be for such a thing.

Gotcha, that makes sense. Do you mind me asking what the use case is? Will you always need to query all of the contents of the Global Artifact?

Yea, stay away from sheets if you’ve got sensitive data.

MySQL is pretty easy to setup and get running, and it’s free. We also have Elastic Search which might be an option too.

Some other stuff we have, but will be slightly more complicated. You could use a CSV file on an FTP server and write to that. You could also roll your own easily if you’re just doing basic data.

I was looking through our library, Git would be interesting. So pull a file, read / write to it, then push it back. That would give you version history too.

FTP is an another option for storage.

Finally, there’s the storage plugin, but it’s flakey, and obsolete. So I wouldn’t recommend that route unless it’s an absolute last resort.

@valen_arnette I have been working with a workflow of a similar case, but mine was 10,000
What I do is I loop through a list to populate a Splunk kvstore with the data. I have a way to resume this loop utilizing the Job Rerun and Array Diff functions.
I then have a separate Workflow that runs on timer schedule that runs every minute that pulls one line from that kvstore, enriches the data, then updates the line

@tyler_terenzoni Ideally the use-case is to check for users who are sending e-mail to personal webmail domains, with the most important part being anything that has an attachment. I spoke with Elijah Martin-Merrill about this today, and there may potentially be ways we could reduce the overall volume that I’m going to start looking into internally. Some of what we’re seeing is likely expected business communications from members of our staff, so if we can filter that down I believe it’ll make the final numbers a little easier to swallow. At the end of the day, the workflow I’m envisioning may perform a more supportive role next to a full DLP tool - but I’ll have to see what I can do to cut the numbers down before anything.