And yet another question about yield return
So I need to execute remotely different SQL scripts. The scripts are in TFS so I get them from TFS automatically and the process iterates through all the files reading their content in memory and sending the content to the remote SQL servers.
So far the process works flawlessly. But now some of the scripts will contain bulk inserts
increasing the size of the script to 500,000 MB or more.
So I built the code "thinking" that I was reading the content of the file once in memory but now I have second thoughts.
This is what I have (over simplified):
public IEnumerable<SqlScriptSummary> Find(string scriptsPath)
{
if (!Directory.Exists(scriptsPath))
{
throw new DirectoryNotFoundException(scriptsPath);
}
var path = new DirectoryInfo(scriptsPath);
return path.EnumerateFiles("*.sql", SearchOption.TopDirectoryOnly)
.Select(x =>
{
var script = new SqlScriptSummary
{
Name = x.Name,
FullName = x.FullName,
Content = File.ReadAllText(x.FullName, Encoding.Default)
};
return script;
});
}
....
public void ExecuteScripts(string scriptsPath)
{
foreach (var script in Find(scriptsPath))
{
_scriptRunner.Run(script.Content);
}
}
My understanding is that EnumerateFiles
will yield return
each file at a time, so that's what made me "think" that I was loading one file at a time in memory.
Once that I'm iterating them, in the ExecuteScripts
method what happens with the script
variable used in the foreach
loop after it goes out of scope? Is that disposed? or does it remain in memory?
If it remains in memory that means that even when I'm using iterators and internally using yield return
when I iterate through all of them they are still in memory right? so at the end it would be like using ToList
just with a lazy execution is that right?
If the script
variable is disposed when it goes out of scope then I think I would be fine
How could I re-design the code to optimize memory consumption, like forcing just to load the content of a script into memory one at a time
Additional questions:
How can I test (unit/integration test) that I'm loading just one script at a time in memory?
How can I test (unit/integration test) that each script is released/not released from memory?
Once that I'm iterating them, what happens with the script variable used in the foreach loop after it goes out of scope? Is that disposed? or does it remain in memory?
If you mean in the ExecuteScripts
method - there's nothing to dispose, unless SqlScriptSummary
implements IDisposable
, which seems unlikely. However, there are two different things here:
script
variable goes out of scope after the foreach
loop, and can't act as a GC rootscript
variable has referred to will be eligible for garbage collection when nothing else refers to it... including script
on the next iteration.So yes, basically that should be absolutely fine. You'll be loading one file at a time, and I can't see any reason why there's be more than one file's content in memory at a time, in terms of objects that the GC can't collect. (The GC itself is lazy, so it's unlikely that there'd be exactly one script in memory at a time, but you don't need to worry about that side of things, as your code makes sure that it doesn't keep live references to more than one script at a time.)
The way you can test that you're only loading a single script at a time is to try it with a large directory of large scripts (that don't actually do anything). If you can process more scripts than you have memory, you're fine :)
See more on this question at Stackoverflow