File-based hashing is done is so many places, there's so much heat.
Sub- file-based hashing with feature engineering is necessary for AV, which must take packing, obfuscating, loading, and dynamic analysis into account in addition to zip archives and magic file numbers.
AV AntiVirus applications with LLMs: what do you train it on, what are some of the existing signature databases.
https://SigStore.dev/ (The Linux Foundation) also has a hash-file inverted index for released artifacts.
Also otoh with a time limit,
1. What file is this? Dirname, basename, hashes(s)
2. Is it supposed to be installed at such path?
3. Per it's header, is the file an archive or an image or a document?
4. What file(s) and records and fields are packed into a file, and what transforms were the data transformed with?
Sub- file-based hashing with feature engineering is necessary for AV, which must take packing, obfuscating, loading, and dynamic analysis into account in addition to zip archives and magic file numbers.
AV AntiVirus applications with LLMs: what do you train it on, what are some of the existing signature databases.
https://SigStore.dev/ (The Linux Foundation) also has a hash-file inverted index for released artifacts.
Also otoh with a time limit,
1. What file is this? Dirname, basename, hashes(s)
2. Is it supposed to be installed at such path?
3. Per it's header, is the file an archive or an image or a document?
4. What file(s) and records and fields are packed into a file, and what transforms were the data transformed with?