hi spidering site , reading contents.i want keep request rate reasonable. approx 10 requests per second should ok.currently 5k request per minute , causing security issues looks bot activity. how this? here code
protected void iterareitems(list<item> items) { foreach (var item in items) { getimagesfromitem(item); if (item.haschildren) { iterareitems(item.children.tolist()); } } } protected void getimagesfromitem(item childitems) { var document = new htmlweb().load(completeurl); var urls = document.documentnode.descendants("img") .select(e => e.getattributevalue("src", null)) .where(s => !string.isnullorempty(s)).tolist(); }
you need system.threading.semaphore, using can control max concurrent threads/tasks. here example:
var maxthreads = 3; var semaphore = new semaphore(maxthreads, maxthreads); (int = 0; < 10; i++) //10 tasks in total { var j = i; task.factory.startnew(() => { semaphore.waitone(); console.writeline("start " + j.tostring()); thread.sleep(1000); console.writeline("end " + j.tostring()); semaphore.release(); }); }
you can see @ 3 tasks working, others pending semaphore.waitone()
because maximum limit reached, , pending thread continue if thread released semaphore semaphore.release()
.
Comments
Post a Comment